COVID-19 Death Rate Calculation

Edward Tsang 2020.03.13; updated 2020.03.15

The death rate of COVID-19 depends on how statistics are gathered. Different countries probably do it differently. Therefore, one must be careful when comparing or combining figures from different countries.

This is not an attempt to extract more information from the data available. It is an attempt to clear one's mind.


The COVID-19 epidemic

A novel conoronavirus, named COVID-19 (at some point, known as 2019-nCoV or Wuhan Coronavirus) is currently spreading around the world. Following is a snap shot of the statitics from Worldometer at 13:00 on Friday 13rd March 2020:

Table 1: COVID-19 statitics from Worldometer, assessed at 13:00 on Friday 13rd March 2020
Country Total Cases Total Death Total Recovered Active Cases Total Cases per 1 million population
Total 138959 5116 70727 63116 17.8
China 80815 3177 64152 13486 56.1
Italy 15113 1016 1258 12839 250.0
Iran 11364 514 3529 7321 135.3
South Korea 7979 71 510 7398 155.6

"Active", "recovered" and "death" percentages

A case is either "active" or "closed". A closed case either ends in "recovered" or "death". The following shows their percentages of "active", "recovered" and "death" based on the total number of cases (Column 2).

Table 2: ratios based on Table 1
Country Total Cases Death % Recovered % Active % Total Cases per 1 million population
Total 138959 3.7% 50.9% 45.4% 17.8
China 80815 3.9% 79.4% 16.7% 56.1
Italy 1016 6.7% 8.3% 85.0% 250.0
Iran 11364 4.5% 31.1% 64.4% 135.3
South Korea 7979 0.9% 6.4% 92.7% 155.6

This table shows significant difference between different countries. South Korea shows a death rate of 0.9%, when Italy shows a death rate of 6.7%.

Subsets Analysis

The above tables blur many unknowns. Below is an attempt to clarify what the knowns are. This process does not help us to extract more information from the data available. This is an attempt to clear one's mind.

Every member of the population can only be in one of the following categories: Untested, tested negative or tested positive. Let:

For simplicity, let us assume that every member of the population is either uninfected or infected (even if asymptomatic). Let: Every member of the population must be in one of the following sets: For simplicity, we use the labels of each set to represent the size of the set. For example, UN represeents the number of people who are uninfected and untested.

Table 3: subsets based on reality and tested results
Untested Tested Negative Tested Positive Sum
Negative in reality UN (Untested Negative) TN (True Negative) FP (False Positive) N (Total Negative in reality)
= UN + TN + FP
Positive in reality UP (Untested Positive) FN (False Negative) TP (True Positive) P (Total Positive in reality)
= UP + FN + TP
U (Total Untested)
= UN + UP
T- (Tested Negative)
= TN + FN
T+ (Tested Positive)
= FP + TP
Pop (Population)
= N + P
= U + T- + T+

Further classification

Every Positive case in reality belongs to one of the following three sets:

  1. P.A: Active (i.e. still infected, but still fighting the disease)
  2. P.R: Recovered; or
  3. P.D: Dead
Therefore, the "Positive in reality" row in the above table can be elaborated as shown below.

Table 4: Separating Positive cases in Table 3 into active, recovered and death cases
Untested Tested Negative Tested Positive Sum
Negative in reality UN TN FP N = UN + TN + FP
Positive and Active UP.A FN.A TP.A P.A
= UP.A + FN.A + TP.A
Positive and Recovered UP.R FN.R TP.R P.R
= UP.R + FN.R + TP.R
Positive and Dead UP.D FN.D TP.D P.D
= UP.D + FN.D + TP.D
U
= UN + UP.A + UP.R + UP.D
T-
= TN + FN.A + FN.R + FN.D
T+
= FP + TP.A + TP.R + TP.D
Pop
= N + P.A + P.R + P.D
= U + T- + T+

Practices that affect the calculation of motality rate

With the above sets defined, we are now in a position to ask how the death rate is arrived at.

  1. P.D divided by P:
    This piece of information can only be collected if every death is examined for the coronavirus, including those untested (UP.D) and those tested but believed to be negative (FN.D);
  2. TP.D divided by T+:
    This will only count the patients who were tested positive and eventually died.
We have argued in another blog that this is a poor estimation, but let us assume that this is the measure.

Many factors could affect the statistics.

Different countries take different practices. Therefore, one must be careful when comparing or combining figures from different countries.

Statistics

What do the data in the first table tell us? For simplicity, let us assume that the tests are 100% accurate. In other words, there are no false negatives (FN = 0) and false positives (FP = 0). The world population from Worldometer is roughly 7.77 billion. Then the first row of Table 1 gives us the following table:

Table 5: Global COVID-19 statistics (row 1, Table 1) presented under the format of Table 4
Untested Tested Negative Tested Positive Sum
Negative in reality UN TN 0 N = UN + TN
Positive and Active UP.A 0 63,116 P.A = UP.A + 63,116
Positive and Recovered UP.R 0 70,727 P.R = UP.R + 70,727
Positive and Dead UP.D 0 5,116 P.D = UP.D + 5,116
U
T- = TN T+ = 138,959 Pop = 7,770,000,000
Assumption: All tests are 100% accurate

At the moment, the overall death rate (DR) is calculated by 5,116 / 138,959 = 3.7%. The real death rate (DR*), according to the same formula, is DR* = P.D / (P.A + P.R + P.D). The hidden cases, i.e. the number of infected people who have not been tested (UP), is critical to the value of DR*.

How could we gather more data?

  1. TN:
    They could have told us how many people have been tested (TN) (= T- under the 100% test-accuracy assumption). This figure should be available.
  2. UP.D:
    Unless we manage to test everyone (which is most unlikely), U remains unknown. However, post-mortem analysis on every deceased, if desired [this is not a suggestion], could confirm every case that was positive, i.e. make UP.D = 0.
  3. U divided by Pop:
    The proportion U divided by Pop should be relevant. (This percentage will be known as soon as T- is given.)
    1. Places such as Singapore and Hong Kong examine as many people as they can afford to. By reducing U, their statistics reflect the reality more closely. This may or may not help containing the virus, but it probably helps us assess the situation.
    2. The UK government advises people to stay at home if they develop any symptoms of infection. That means these people will not be tested; in other words, U divided by Pop will remain high.
    3. So Singapore and Hong Kong can go with data-driven policies (i.e. react to changes in statistics) while the UK has to go with model-driven policies (i.e. build a model to describe the situation and use it to guide policy-making).

[End]


Related: