Scientific flaws in the Research Excellence Framework

Edward Tsang 2014.12.18; updated 2015.06.02

The Research Excellence Framework (REF) is a government assessment of research quality in UK universities. It gives a score to every submitting department in every university. These scores are then used to rank the universities. Unfortunately, this way of interpreting data is fundamentally flawed. Averaging rank numbers is not scientifically meaningful.

How a unit in a university is scored

The Research Excellence Framework (REF) is a government assessment of research quality in UK universities. It gives a score to every submitting department in every university. These scores are then used to rank the universities. To score a department, each staff is put into a research rating quality category, 0, 1, 2, 3 or 4, with 0 being the lowest and 4 the highest. Each department, or unit of assessment as it is called, is given a "GTA", which is the category times the % of staff (in terms of Full Time Equivalence, FTE) in that category returned.

The scores are problematic

The rating quality categories is ordinal data: it tells the ranking of the category. Taking averages with ordinal data is very dangerous. This is because the difference between 1st and 2nd may be very slim, but the difference between 2nd and 3rd could be very big.

The REF panel could have treated the categories as interval data: i.e. the distance from 0 to 1 is the same as the interval from 1 to 2, etc. To implement this, the assessment panel must be briefed carefully. Unfortunately, even if this is the case, the position of a researcher in category 1 could range from 0.50 to 1.49 (assuming rounding).

Once the departments are ranked, the institutions are ranked, despite all the warnings issued by statisticians. This commits to the flaw explained above. Averaging ordinal data is dangerous. Using such averages to order institutions is therefore even more dangerous.

"This is just an approximation"

Some might argue that, despite all the above problems, the ratings are good enough approximations. The trouble is, when league tables are released, people tend to treat them as precise numbers. In the 2014 REF, a university that scores 3.044 would rank 37. Had it scored 3.065 (a difference of 0.011), it would have been ranked 31 (6 places higher).

People support this system though it is flawed

Most people with quantitative training should be able to see the flaws. That applies to most of the staff being assessed. But why do we still allow it to continue? It seems to me that managers are keen to reduce everything into one number. They do so even though they know that the way they interpret data is fundamentally flawed.

The flaw goes beyond the numbers

Apart from the flaw in data interpretation, the research assessment exercise is flawed in other ways. For example, universities use different tactics to improve their positions in the league table without improving their research environment or research culture. Discussion of this will be left to another occasion.

[End]

Types of Data & Measurement scales: nominal, ordinal, interval and ratio

REF as a destructive exercise

"Education is not their goal"