What you need to know about R number July 25, 2020 in Challenge 44

What is R?

The basic reproduction number is not to be confused with the effective reproduction number (expected number of cases).

The R number is an average. A complex, statistical calculation. R refers to single person who will go on to pass on the infection.

The number is not fixed it changes according to behavioural changes, or as immunity develops.

How is R calcluated?

To calculate R you require data. Obviously to acquire data people need to be infected. In other words, historical data is a necessitate to calculate R.

Data is acquired by the number of people diagnosed as either infected by the virus or died from the illness.

As data is historical calculations of the spread is an estimate.

Why a number above one suggests concern.

A number above one implies exponential Covid 19 infection rate.

Below one suggests the illness is in decline and may disappear.

R squared, also called coefficient of determination, is a statistical calculation that measures the degree of interrelation and dependence between two variables. In other words, it is a formula that determines how much a variable’s behaviour can explain the behaviour of another variable.

The Pearson correlation coefficient is used to measure the strength of a linear association between two variables, where the value r = 1 means a perfect positive correlation (virus Infection growing) and the value r = -1 means a perfect negative correlation (virus growth declining or eradicated).

A scatterplot diagram is used to graphically illustrate correlation r values.

Look at the scatterplot diagram below.

Figure (a) a correlation of nearly +1 - A perfect evolution (virus growing)
Figure (b) a correlation of -0.50 - declining but the points are somewhat scattered in a wider band, showing a linear relationship is present, but not as great as in figures (a) and (c)
Figure (c) a correlation of +0.85 - a very strong uphill linear pattern (not as strong as (a).
Figure (d) a correlation of +0.15 - does not indicate much is happening and it shouldn’t, its correlation is very close to 0, (would suggest Covid 19 is dying out).

• –0.70. A strong downhill (negative) linear relationship
• –0.50. A moderate downhill (negative) relationship
• –0.30. A weak downhill (negative) linear relationship
• No linear relationship
• +0.30. A weak evolution (positive) linear relationship
• +0.50. A moderate evolution (positive) relationship
• +0.70. A strong evolution (positive) linear relationship
• Exactly +1. A perfect evolution (positive) linear relationship

Warning

If a scatterplot does not show at least a bit of a linear relationship, the correlation does not mean much.

So why measure a linear relationship if there appears to not be one?

Think of no linear relationship in two ways:

1. If no relationship exists at all, calculating the correlation does not make sense because correlation only applies to linear relationships.
2. If a strong relationship exists but it is not linear, the correlation may be misleading. The reason in some cases a strong curved relationship exists. That is why it is critical to examine the scatterplot first.

Summary

Many mistakenly think a correlation of –1 indicates no relationship.

In fact, a correlation of –1 means the data is aligned in a perfect straight line.

In other words, the strongest negative linear relationship you can get.

In Covid 19 terms the virus may disappear.

Key points to remember.

• Data is historical.
• The data may be incomplete or incorrectly defined or collected.
• Such analysis is subjective i.e. the answer may or may not be true. It’s open to interpretation.
• At best a guideline.