The Research Excellence Framework (REF) is a government assessment of research quality in UK universities. It gives a score to every submitting department in every university. These scores are then used to rank the universities. Unfortunately, this way of interpreting data is fundamentally flawed. Averaging rank numbers is not scientifically meaningful.
The Research Excellence Framework (REF) is a government assessment of research quality in UK universities. It gives a score to every submitting department in every university. These scores are then used to rank the universities. To score a department, each staff is put into a research rating quality category, 0, 1, 2, 3 or 4, with 0 being the lowest and 4 the highest. Each department, or unit of assessment as it is called, is given a "GTA", which is the category times the % of staff (in terms of Full Time Equivalence, FTE) in that category returned.
The rating quality categories is ordinal data: it tells the ranking of the category. Taking averages with ordinal data is very dangerous. This is because the difference between 1st and 2nd may be very slim, but the difference between 2nd and 3rd could be very big.
The REF panel could have treated the categories as interval data: i.e. the distance from 0 to 1 is the same as the interval from 1 to 2, etc. To implement this, the assessment panel must be briefed carefully. Unfortunately, even if this is the case, the position of a researcher in category 1 could range from 0.50 to 1.49 (assuming rounding).
Once the departments are ranked, the institutions are ranked, despite all the warnings issued by statisticians. This commits to the flaw explained above. Averaging ordinal data is dangerous. Using such averages to order institutions is therefore even more dangerous.
Some might argue that, despite all the above problems, the ratings are good enough approximations. The trouble is, when league tables are released, people tend to treat them as precise numbers. In the 2014 REF, a university that scores 3.044 would rank 37. Had it scored 3.065 (a difference of 0.011), it would have been ranked 31 (6 places higher).
Most people with quantitative training should be able to see the flaws. That applies to most of the staff being assessed. But why do we still allow it to continue? It seems to me that managers are keen to reduce everything into one number. They do so even though they know that the way they interpret data is fundamentally flawed.
Apart from the flaw in data interpretation, the research assessment exercise is flawed in other ways. For example, universities use different tactics to improve their positions in the league table without improving their research environment or research culture. Discussion of this will be left to another occasion.
[End]
Related:
All Rights Reserved