Performance Measurement (updated 27 Feb 99)

We live in an age of measurement. We want to measure everything: how high, how heavy, how much, how many? Nowhere is this impulse more powerful than in the world of investment. The constant stream of data on financial markets now available at the touch of a computer keyboard is all measurement of one sort or another. And it keeps us continually up-to-date with the performance of our assets, our funds and the people managing them on our behalf.

Measures are essential if we are to try to make forecasts. For what can we say about how much or how many in the future unless we know the base from which we set off - the initial conditions. And in investment as elsewhere, most people implicitly accept that the past is as good a guide to the future as anything else. A whole industry of performance measurers has emerged in recent years, offering historical track records of money managers. Morningstar, for example, summarizes their past performance in the form of star ratings as if they were hotels or restaurants.

Of course, there should be performance measurement. But strangely enough, it was rare only a generation ago in investment management. Results then were described in vague qualitative terms and, with institutional investors' portfolios largely in bonds, the variations were not consequential. But a bull market in equities changed things. Demands began to be heard: 'How well are we doing? And more importantly, how well are we doing compared to an index or our competition?'

Performance measurement guru: Peter Dietz

At first, performance measurement crept into mutual fund merchandising under very strict SEC rules on advertising. The goal was less to inform shareholders and more to increase fund sales. But the standards of strict time and size weightings were developed at this time.

Institutions started getting concerned about performance during the bull market in equities in the 1960s. Peter Dietz of the Frank Russell Company wrote the initial book on the subject, Performance Measurement. Essentially, his work was an extension of mutual fund accounting except he discussed a range of alternatives. In particular, he pointed out the principal measurement aberration that could be introduced: the influence of initial size on the calculations.

In fund accounting, the same percentage changes are treated as equally meaningful on small amounts in a fund as on large ones. So a fund could demonstrate exciting performance on its starting investment amounts, attract new investors and find that such results became more difficult to achieve. Yet the numbers would give no indication of diminution in performance due to size. Similarly, the calculations suffered, at least originally, from changes in management and market cycles.

Dietz raised a number of other important statistical issues like the need for carefully ensuring that the database remained intact. There was a natural tendency for accounts to leave a management organization after poor results and hence to be dropped, since inception, from the calculations. Clearly, this distortion gave an upward bias to the results. Dietz gave a precise description of the appropriate measurement rules to follow and they were largely implemented by the institutional community.

Not surprisingly, the Russell organization was one of the first to develop techniques for its clients to help understand their results. As for Dietz, he opened the Russell office in Tokyo, introducing performance measurement to Japanese institutions who took it to ever more precise specificity. Japanese institutions became as performance hungry in their bull market that ended in the 1980s as American managers demonstrated in the 1990s. Sadly, Dietz died while his invention was flourishing.

Since the 1960s, a substantial number of academic studies have been done on the continuity of investment results. These suggest great caution with the predictability of returns - what is known as low auto-correlation - and mean that the results of one period cannot be predicted by another.

The use of performance measurement had become so widespread in the early 1980s that the Association of Investment Management & Research (AIMR), the industry association, determined that standards must be mandated for all investment firms presenting performance numbers. Dean LeBaron, one of the co-authors of this book, was one of the handful of brave souls to face the wrath of their peers by propounding standards that at first should, and later must, be followed. With hindsight, to have a broad representation of the membership of AIMR, some on this committee should have come from a group with poor results. But as it happened, and it may be a natural coincidence about those perceived to be leaders then, each member had results that could be widely publicized with pride.

The committee met and produced reform recommendations that were accepted, including the standards that commingled results had to include all accounts with no convenient dropping of those that left, and they had to follow, in general, mutual fund accounting. Another committee on implementation met and encouraged independent audits and clarified some interpretation issues. And an entire industry of performance measurement was born, encompassing software, practitioners and performance attribution specialists.

Counterpoint

Teachers of science typically urge students not to record measurements to a greater degree of precision than their crude instruments allow. Although interpolation between the marks on a scale is possible, error rather than greater precision emerges by writing down numbers that are unsubstantiated. While this may seem counter-intuitive, it is an observation worth noting when dealing with investment numbers, especially in these days of computers with their appetite for numbers to the right of the decimal point.

There are two levels of significance for investment numbers. The first is: do the numbers imply greater merit than is warranted by the underlying limitations of the measurement technique? The second is: are the numbers produced significant and/or predictive? Performance numbers are particularly subject to this scrutiny.

On the surface, performance measurement promises more than it is able to deliver, at least at our present level of statistical sophistication. Results can be calculated with honesty on the universe of accounts and funds and with some interpretation required on the type of fund, out to many decimal places. They look very precise and scientific. But they almost always fail standard statistical tests of significance and can hardly ever be projected forward. In all probability, they are subject to initial conditions that are unrepeatable and, in any case, such numbers are inherently non-predictive for the timeframes with which they are customarily used - three years or what used to be called a market cycle.

No matter how well the data is cleaned up and standards maintained, the tool of performance measurement is faulty unless used with caution as a guide to qualitative judgment. More harm has probably been done by the misleading information in performance figures than any other statistical evil. Improving the quality of the numbers does not necessarily improve the quality of the results; rather, the results should be of historic interest only, forming a basis for discussion.

But in practice, past performance is used as a predictive tool, especially by those investment managers whose case for superiority is supported by using the numbers. Fund managers with a five Morningstar rating, for example, are not slow to use it in their marketing campaigns just as hotels and restaurants often publicize their star ratings to attract business. And while, logically, half the managers in any performance comparison are below the mean, the only managers who solicit business on that basis are the ones with above the mean results.

Little work has been done to make performance measurement more predictive nor to help practitioners understand its limitations. Until then, performance standards may be likened to improving the cleanliness of cigarette factories, a worthy pursuit that provides a 'clean factory' stamp to give the consumer more confidence in the product. In reality, the stamp of approval does not alter the likelihood that the proper stamp on investment performance numbers is 'use of these numbers may be injurious to your wealth.'

Where next?

We are trying desperately to improve our ability to have accurate measures. And yet as we improve our data - which we do - but do not improve anything to do with the measures, the models in which they are used, in the end we come out with nothing that is any better.

The AIMR standards improved the data substantially by making it honest. But they did not improve anything about how the data was to be used. So, as a result, the data is no more forecastable than it was then, although it looks better, and can be used, seemingly, with higher degrees of confidence. Similarly, with all other forms of data. We want to measure, but we are still using the same old linear, Newtonian, archaic tools rather than tools that are adaptive, dynamic and which allow for multiple outcomes and options.

Perhaps organizational and portfolio performance is better at the edge of chaos. Consider, for example, a personnel performance system that tells people there is a system for measurement - and there is - but that they will not know in advance what it is and how it changes. The system will be revealed at the end of the period. To do otherwise is like setting up a test-gaming exercise. The system must be as ambiguous as the world is.

In terms of selecting investment managers on the basis of short-term performance indicators, despite all the marketing money thrown at trumpeting past successes, they are no indication of future achievements. Indeed, it may be better to pick a solid long-term performer that has underperformed in the last couple of years on the assumption of regression to the mean. And perhaps managers should be rewarded, as hedge funds typically are, on the basis of actual performance rather than selected on the basis of past performance.

Performance measurement needs to be rethought to merit scientific recognition as a meaningful measure:

· As a start, we might remove the emphasis on precise point calculations, noting how often we see performance numbers shown with two or even four decimal places - a practice that ignores the science student's lesson not to write down data beyond the capability of the measuring instruments. Such specificity implies relevance.

· Next, we can adapt to results that always show a range, perhaps in terms of standard deviations that can be expected. Numerical results are equivalent in a cluster where each value, whether above or below another, is of equal merit.

· And finally, we should learn how to use newer, more dynamic statistical tests, the kind that are customarily used in other fields to attempt forecasts from the data. After all, that is generally the purpose of the data: to estimate the future rather than assigning a historical value to the past.

Read on

In print

Peter Dietz, Performance Measurement

Online

www.morningstar.com - the Morningstar website