Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Observational Study
. 2020 Apr;2(4):e201-e208.
doi: 10.1016/S2589-7500(20)30026-1. Epub 2020 Feb 20.

Early epidemiological analysis of the coronavirus disease 2019 outbreak based on crowdsourced data: a population-level observational study

Affiliations
Observational Study

Early epidemiological analysis of the coronavirus disease 2019 outbreak based on crowdsourced data: a population-level observational study

Kaiyuan Sun et al. Lancet Digit Health. 2020 Apr.

Abstract

Background: As the outbreak of coronavirus disease 2019 (COVID-19) progresses, epidemiological data are needed to guide situational awareness and intervention strategies. Here we describe efforts to compile and disseminate epidemiological information on COVID-19 from news media and social networks.

Methods: In this population-level observational study, we searched DXY.cn, a health-care-oriented social network that is currently streaming news reports on COVID-19 from local and national Chinese health agencies. We compiled a list of individual patients with COVID-19 and daily province-level case counts between Jan 13 and Jan 31, 2020, in China. We also compiled a list of internationally exported cases of COVID-19 from global news media sources (Kyodo News, The Straits Times, and CNN), national governments, and health authorities. We assessed trends in the epidemiology of COVID-19 and studied the outbreak progression across China, assessing delays between symptom onset, seeking care at a hospital or clinic, and reporting, before and after Jan 18, 2020, as awareness of the outbreak increased. All data were made publicly available in real time.

Findings: We collected data for 507 patients with COVID-19 reported between Jan 13 and Jan 31, 2020, including 364 from mainland China and 143 from outside of China. 281 (55%) patients were male and the median age was 46 years (IQR 35-60). Few patients (13 [3%]) were younger than 15 years and the age profile of Chinese patients adjusted for baseline demographics confirmed a deficit of infections among children. Across the analysed period, delays between symptom onset and seeking care at a hospital or clinic were longer in Hubei province than in other provinces in mainland China and internationally. In mainland China, these delays decreased from 5 days before Jan 18, 2020, to 2 days thereafter until Jan 31, 2020 (p=0·0009). Although our sample captures only 507 (5·2%) of 9826 patients with COVID-19 reported by official sources during the analysed period, our data align with an official report published by Chinese authorities on Jan 28, 2020.

Interpretation: News reports and social media can help reconstruct the progression of an outbreak and provide detailed patient-level data in the context of a health emergency. The availability of a central physician-oriented social network facilitated the compilation of publicly available COVID-19 data in China. As the outbreak progresses, social media and news reports will probably capture a diminishing fraction of COVID-19 cases globally due to reporting fatigue and overwhelmed health-care systems. In the early stages of an outbreak, availability of public datasets is important to encourage analytical efforts by independent teams and provide robust evidence to guide interventions.

Funding: Fogarty International Center, US National Institutes of Health.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Age distribution of patients with COVID-19 from crowdsourced data (A) All 507 cases by disease outcome (alive or unknown or deceased at time of reporting); vertical bars are case counts in each age group and the dotted lines show the median age for patients who were alive or with unknown outcomes at the time of reporting and those who had died at the time of reporting. (B) Relative risk by 5-year age band for 364 cases reported in China. The observed data are shown by bars and the estimated relative risk is shown by datapoints and a spline-smoothed curve. COVID-19=coronavirus disease 2019.
Figure 2
Figure 2
Daily timeline of the COVID-19 epidemic based on crowdsourced data and official sources, by location All data are by date of symptom onset. Cumulative curves are shown for the official China CDC data (published on Jan 28, 2020), and for the crowdsourced data. Crowdsourced data have been rescaled and multiplied by 20 to enable clear comparison with the official China CDC data. Histograms are daily case count, based on crowdsourced data for Hubei province, mainland China non-Hubei province, and cases outside of mainland China. CDC=Centers for Disease Control. COVID-19=coronavirus disease 2019.
Figure 3
Figure 3
Daily timeline of the COVID-19 epidemic at the provincial level in China, during January, 2020 Vertical bars show the daily counts of new reported cases, with provinces sorted by total number of reported cases. The timeline for each province is reconstructed on the basis of daily outbreak situation reports provided by provincial health authorities and posted on DXY.cn and are true as of Jan 31, 2020. COVID-19=coronavirus disease 2019.
Figure 4
Figure 4
Delay between symptom onset and seeking care at a hospital or clinic (A) and between seeking care at a hospital or clinic and reporting (B) of COVID-19 cases, by location Data are for the entire study period and include all cases reported between Jan 13 and Jan 31, 2020. Datapoints are medians, with the spread of data indicated by the filled shapes. All time intervals significantly differ between locations (Kruskall Wallis test, p<0·0001). COVID-19=coronavirus disease 2019.

Comment in

  • Crowdsourcing data to mitigate epidemics.
    Leung GM, Leung K. Leung GM, et al. Lancet Digit Health. 2020 Apr;2(4):e156-e157. doi: 10.1016/S2589-7500(20)30055-8. Epub 2020 Feb 20. Lancet Digit Health. 2020. PMID: 32296776 Free PMC article. No abstract available.

References

    1. WHO . World Health Organization; Geneva: 2020. Statement on the second meeting of the International Health Regulations (2005) Emergency Committee regarding the outbreak of novel coronavirus (2019-nCoV)https://www.who.int/news-room/detail/30–01–2020-statement-on-the-second-...
    1. Rivers C, Chretien JP, Riley S, et al. Using “outbreak science” to strengthen the use of models during epidemics. Nat Commun. 2019;10 - PMC - PubMed
    1. Chowell G, Bertozzi SM, Colchero MA, et al. Severe respiratory disease concurrent with the circulation of H1N1 influenza. N Engl J Med. 2009;361:674–679. - PubMed
    1. Chowell G, Echevarría-Zuno S, Viboud C, et al. Characterizing the epidemiology of the 2009 influenza A/H1N1 pandemic in Mexico. PLoS Med. 2011;8 - PMC - PubMed
    1. Fraser C, Donnelly CA, Cauchemez S, et al. Pandemic potential of a strain of influenza A (H1N1): early findings. Science. 2009;324:1557–1561. - PMC - PubMed

Publication types