The novel coronavirus outbreak in China has caught the attention of billions of people worldwide. Hospitals in Hubei are overwhelmed with the flood of patients, and labs across China are working around the clock to test patients for the virus. In these difficult times, how can we utilize the power of science to estimate the potential scale of this outbreak? To find our answer, we must look beyond China.

For months, the novel coronavirus (2019-nCoV) outbreak in Wuhan, China has been the center of people’s concern around the world. On January 17th, 17 days after the first case was reported, Imperial College London’s MRC Centre for Global Infectious Disease Analysis and World Health Organization’s Collaborating Centre for Infectious Disease Modelling jointly published a research article estimating the potential total number of 2019-nCoV cases in Wuhan. According to the report’s estimates, the number of potential cases at the time of publication was approximately 1,700, substantially higher than the actual reported number. This article aims at answering two important questions: why there was such a drastic disparity, and whether we should rely on the report’s proposed methods for further research? Please note that the dataset used in the original research is seriously outdated as of today, and any conclusions drawn should not be interpreted as predictions of the current epidemic. Read the original report here.

Until January 16th, before the report was published, data collected from official channels had shown 41 confirmed cases in Wuhan and three outside China (two in Thailand and one in Japan). The research team argued, using the number of cases detected in other countries, it is possible to estimate the actual number of clinically comparable cases in Wuhan. If proven accurate, the significance of such an estimate is undisputed, for it gives people and the government an idea about the true scale of this outbreak, which would be helpful in both resource allocation and understanding the infectiousness of the novel coronavirus. Ever since the beginning of this outbreak, the medical system in Wuhan has been running on full capacity, meaning that many patients with suspected symptoms were unable to seek medical attention. As a result, many experts feared that the actual number of infected people in Wuhan might be much higher than the reported number of cases. In order to acquire more accurate estimates, the research team turned their eyes towards countries outside China, where each suspected case was cautiously diagnosed and treated. Based on more accurate data collected overseas, the research team devised an algorithm to estimate the total number of cases in Wuhan.

To understand how they constructed their algorithm, we’ll use a hypothetical figure. Let’s call him Tom. Tom is an ordinary resident living in Wuhan. Unfortunately, he has been infected with the novel coronavirus. But since the virus is asymptomatic at the beginning, Tom doesn’t know. Suppose we know on each given day, there is a probability p1 that he travels abroad. We can find p1 using the formula p1 = (daily outbound international travellers from Wuhan) / (catchment population of Wuhan International Airport). Meanwhile, the virus has a 10-day window between infection and detection. Therefore, the probability p2 that Tom gets confirmed outside China, i.e. the probability that he travels abroad in the 10 days, equals 10 times p1. Since Tom could be any one of the many people infected in Wuhan, we infer that pTom = pEveryone. In other words, p2 is also the probability that anyone gets confirmed outside China. Hence, we can calculate the conclusion using the formula (total number of cases) = (number of cases detected overseas) / (probability any one case will be detected overseas).

However, although researchers chose the relatively more reliable data collected in less affected countries, errors in parameters could also heavily influence the report’s accuracy. For example, what if the number of passengers travelling abroad was to increase significantly following the outbreak? The researchers’ solution was to construct one baseline scenario where all parameters are ideal, and calculate four alternative scenarios where one parameter is not where it should be. Known as testing the sensitivity of estimates, the process is designed to acquire a wider range of possible numbers and minimize the impact of errors. Fearing their conclusion might not be rigorous enough, the research team added another precaution. They decided to calculate the 95% confidence interval of the dataset, meaning the range of numbers where they are 95% sure that the actual number will lie in. In essence, confidence intervals make confidence statements about unknown parameters, based on our knowledge of the sampling distributions of estimators.

Interesting to note, five days after the original study was published, the research team updated their report, taking into account the additional cases confirmed outside China. Since the initial publication, three extranational cases had jumped to seven, skewing the team’s baseline estimate to around 4,000 cases in Wuhan. However, although their projection has more than doubled in five days, it should not be interpreted as implying the outbreak has likewise doubled in size. Instead, it was merely because of enhanced detection and immediate reporting that the number of extranational cases increased, thereby amending the previous projection.

This article was written by Yihan Xu. Please send an email to to get in touch. Also, you can visit the author’s economics blog for more original articles.
Photo Credit: Alissa Eckert, Dan Higgins

Leave a Reply

Your email address will not be published. Required fields are marked *