Measurement Dysfunction

Motivation#

Measurement dysfunction is a situation in which the act of measurement produces results directly contrary to the actual goals of measurement. An excellent reference on this topic is Measuring and managing performance in organizations by Robert Austin.

Care must be taken in the design of a departmental dashboard to avoid producing measurement dysfunction. For example, if "teaching performance" is defined in terms of "SSH per Faculty FTE", then a single-pointed focus on improving "performance" can lead to large lecture hall classes without TAs, with potentially disastrous consequences for the goal of providing a quality educational experience for students.

The blog post How management by metrics leads us astray provides cautionary tales about the dangers of metrics for organizational improvement. The accompanying Hacker News Discussion adds several of the most popular measurement dysfunction anecdotes (the Soviet shoe factory, the scrap metal quota, etc.)

Many of the cited problems with metrics can be viewed as a result of confusing the "map" with the "territory". Maps can be incredibly useful, but they are an abstraction of reality, not reality itself. For example, if a map does not contain topographic information and you are using it to estimate the time to walk between two points, your estimate may be wildly inaccurate when you try walk that path because there is a substantial elevation gain. This doesn't mean that particular map is totally useless and should never be used. It means that you have discovered a problem with that particular abstraction of reality for that particular purpose, and you need more abstractions in future for that purpose.

Here are some steps that can be taken to avoid measurement dysfunction when developing and using a departmental dashboard.

Assess data quality#

There is a cliche in computer science: "Garbage in, garbage out". The Dashboard will not provide useful guidance if the data used to generate the dashboard is of low quality.

Be sure to create mechanisms to assess the quality of the data. For example, when using course evaluation data, it is important to track the percentage of students in the class submitting evaluations. If the percentage is very low, then the results may not be representative.

Validate using multiple measures#

When possible, see if important goals can be assessed by more than one independent measure. For example, if a goal is to ensure that classes are relevant to professional goals, then this could be assessed by a Department-level course evaluation question, and Exit Survey Questionnaire, and a Stakeholder Questionnaire.

The Dashboard indicates what, not why#

Think of the Department Dashboard as similar to a thermometer: it can function to indicate that something appears to be amiss, but additional work will always be needed to understand the underlying causes of the measurement.

For example, if the dashboard indicates that a course evaluation response average value for the Department as a whole is below the university average, then the next step is to look at individual classes. For example, perhaps a single, large introductory course generated poor values, but all of the other courses were above the University average. Alternatively, perhaps there is a widespread below average performance on that question across the department. Very different actions would be taken depending upon which situation produced the dashboard value.

False positives, negatives, and signal-to-noise ratio#

Just like any other sensor, the Department Dashboard is susceptible to "false positives" (where a seemingly inappropriate value does not actually indicate a problem) as well as "false negatives" (where the dashboard shows nominal values despite the presence of a problems in the Department).

The possibility of false positives and false negatives does not automatically mean that the dashboard is useless. The utility of the dashboard ultimately rests on its signal-to-noise ratio. If most of the results from the dashboard are false positives or false negatives, then the dashboard will not be valuable to the department.

The primary design goal for each department is to create a dashboard that is useful: one which has a high signal-to-noise ratio. It may a few design iterations, over a few academic years, in order to achieve this. As a brand new technology, we expect that there will be surprises and unanticipated challenges as we start to gain real-world experience with the system. We appreciate your willingness to put time and effort into testing the hypothesis that the UH Department Dashboard can provide meaningful information at acceptable cost to academic units.