. 2021 Dec 21;118(51):e2111455118.

doi: 10.1073/pnas.2111455118.

Global monitoring of the impact of the COVID-19 pandemic through online surveys sampled from the Facebook user base

Christina M Astley^{1

2

3

4}, Gaurav Tuli², Kimberly A Mc Cord², Emily L Cohn², Benjamin Rader^{2

5}, Tanner J Varrelman², Samantha L Chiu⁶, Xiaoyi Deng⁶, Kathleen Stewart⁷, Tamer H Farag⁸, Kristina M Barkume⁸, Sarah LaRocca⁸, Katherine A Morris⁸, Frauke Kreuter^{6

9}, John S Brownstein^{2

3}

Affiliations

¹ Division of Endocrinology, Boston Children's Hospital, Boston, MA 02115; christina.astley@childrens.harvard.edu.
² Computational Epidemiology Lab, Boston Children's Hospital, Boston, MA 02115.
³ Harvard Medical School, Boston, MA 02115.
⁴ Broad Institute of Harvard and MIT, Cambridge, MA 02142.
⁵ Department of Epidemiology, Boston University, Boston, MA 02118.
⁶ Joint Program in Survey Methodology, University of Maryland, College Park, MD 20742.
⁷ Center for Geospatial Information Science, University of Maryland, College Park, MD 20742.
⁸ Meta, Menlo Park, CA 94025.
⁹ Department of Statistics, Ludwig-Maximilians-Universität, Munich 80539, Germany.

PMID: 34903657
PMCID: PMC8713788
DOI: 10.1073/pnas.2111455118

Global monitoring of the impact of the COVID-19 pandemic through online surveys sampled from the Facebook user base

Christina M Astley et al. Proc Natl Acad Sci U S A. 2021.

. 2021 Dec 21;118(51):e2111455118.

doi: 10.1073/pnas.2111455118.

Authors

Affiliations

¹ Division of Endocrinology, Boston Children's Hospital, Boston, MA 02115; christina.astley@childrens.harvard.edu.
² Computational Epidemiology Lab, Boston Children's Hospital, Boston, MA 02115.
³ Harvard Medical School, Boston, MA 02115.
⁴ Broad Institute of Harvard and MIT, Cambridge, MA 02142.
⁵ Department of Epidemiology, Boston University, Boston, MA 02118.
⁶ Joint Program in Survey Methodology, University of Maryland, College Park, MD 20742.
⁷ Center for Geospatial Information Science, University of Maryland, College Park, MD 20742.
⁸ Meta, Menlo Park, CA 94025.
⁹ Department of Statistics, Ludwig-Maximilians-Universität, Munich 80539, Germany.

PMID: 34903657
PMCID: PMC8713788
DOI: 10.1073/pnas.2111455118

Abstract

Simultaneously tracking the global impact of COVID-19 is challenging because of regional variation in resources and reporting. Leveraging self-reported survey outcomes via an existing international social media network has the potential to provide standardized data streams to support monitoring and decision-making worldwide, in real time, and with limited local resources. The University of Maryland Global COVID-19 Trends and Impact Survey (UMD-CTIS), in partnership with Facebook, has invited daily cross-sectional samples from the social media platform's active users to participate in the survey since its launch on April 23, 2020. We analyzed UMD-CTIS survey data through December 20, 2020, from 31,142,582 responses representing 114 countries/territories weighted for nonresponse and adjusted to basic demographics. We show consistent respondent demographics over time for many countries/territories. Machine Learning models trained on national and pooled global data verified known symptom indicators. COVID-like illness (CLI) signals were correlated with government benchmark data. Importantly, the best benchmarked UMD-CTIS signal uses a single survey item whereby respondents report on CLI in their local community. In regions with strained health infrastructure but active social media users, we show it is possible to define COVID-19 impact trajectories using a remote platform independent of local government resources. This syndromic surveillance public health tool is the largest global health survey to date and, with brief participant engagement, can provide meaningful, timely insights into the global COVID-19 pandemic at a local scale.

Keywords: COVID-19 surveillance; SARS-CoV-2 testing; global health; human social sensing.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

**Fig. 1.**
UMD-CTIS data pipeline, coverage, and demographic distributions. (A) The FAUB is sampled daily and invited to participate in the online UMD-CTIS, administered by Qualtrics and accessed via an online form using a smartphone or computer. Participants are asked about demographics, COVID-19 symptoms, behaviors, and outcomes. Facebook supplies survey weights to account for nonresponse and to adjust for basic demographics of the participant. Aggregated data are released to the public in near real time. Researchers may apply to use raw microdata to study COVID-19. (B) The map of the distribution of surveys per capita during the study period for countries and territories sampled and that have survey weights (n = 114, gray for all other countries/territories). (C) The distribution of difference of proportion in each group in UMD-CTIS versus local demographics by six age–gender groups (male and female versus young [18 to 34 y], middle [35 to 54 y], and elderly [>54 y]), that is, *D_g* = *P_g,UMD-CTIS* − *P_g,Census*, where *P_g* is the proportion in group g. (D) The distribution of mean absolute differences across age–gender groups (i.e., δ = Σ_g |*D_g*|/6). The distribution by week (w) of the change (Δ) in the (E) difference of proportions [ΔD_g,w = *D_g,w* – median(*D_g,w*)] and (F) mean absolute difference [Δ_δ*_,w* = δ_w – median(δ_w)] versus the median measure for that country/territory over the study period. The range across all locales (light ribbon), 25th to 75th percentile (dark ribbon), and median (solid line) are shown.

**Fig. 2.**
The global model predicting recent COVID-19 positive test results using self-reported symptoms and minimal demographic data. (A) The receiver operating characteristic of the hyperparameter tuned global model showing the area under the curve. (B) The SHapley Additive exPlanations distribution of relative feature importance from the global model (green diamonds) compared to country/territory models (box and whisker plots). The within-model feature importance was normalized to loss of smell/taste to facilitate between-model comparison.

**Fig. 3.**
The schematic of COVID-19 case and UMD-CTIS surveillance signal benchmarking globally. For each country and territory, the 7-d smoothed COVID-19 case counts from Our World in Data (A) are compared to the survey-weighted CTIS surveillance measure. (B) The CCLI signal for Bolivia and Italy is shown for illustrative purposes. The survey-weighted sum of “yes” responses to the surveillance questions (here the CCLI survey question) for each week was divided by the sum of survey weights for all surveys over a 7-d window. (C) Time series were normalized to a range of 0 to 1using minimum and maximum during the survey period to allow within- and between-locale comparison of trends across a range of values using color intensity. (D) For each country/territory (rows), we combined normalized time series with log₁₀ of the number of surveys (black bar chart), percent surveys per population (white bar chart), age and gender distributions (stacked bar charts), peak day (solid black circle, benchmark; open colored shapes, signals) and the benchmark–signal correlation strength (green) in the form of an annotated heatmap.

**Fig. 4.**
The time series heatmap comparing the benchmark cases to UMD-CTIS–based signals. Refer to the illustration of the generation of the time series heatmap components in Fig. 3. Normalized benchmark (black column) and UMD-CTIS (navy through orange columns) signal time series by country or territory (*Country/Territory*, rows) are clustered by benchmark within geographic regions. Signals include recent positive COVID-19 test result (*Positive Test*), *CCLI*, self-reported fever, cough or loss of smell/taste (*Broad CLI*), or self-reported loss of smell/taste of less than 14 d duration (*Narrow CLI*). The days to peak for each signal is compared (*Peak*) with the benchmark. The Spearman correlation strength (*Correlation*) of UMD-CTIS with the benchmark. Log₁₀ surveys (*LogN*) and surveys per population (*Pct*) as bar charts and proportion of surveys for each *Age* or *Gender* as stacked bar charts.

See this image and copyright information in PMC

References

1. Lipsitch M., Swerdlow D. L., Finelli L., Defining the epidemiology of Covid-19—Studies needed. N. Engl. J. Med. 382, 1194–1196 (2020). - PubMed
1. Tian H., et al. , An investigation of transmission control measures during the first 50 days of the COVID-19 epidemic in China. Science (80-.). 368, 638–642 (2020). - PMC - PubMed
1. Kraemer M. U. G., et al. , Data curation during a pandemic and lessons learned from COVID-19. Nat. Comput. Sci. 1, 9–10 (2021). - PubMed
1. Alwan N. A., Surveillance is underestimating the burden of the COVID-19 pandemic. Lancet 396, e24 (2020). - PMC - PubMed
1. Emanuel E. J., et al. , Fair allocation of scarce medical resources in the time of Covid-19. N. Engl. J. Med. 382, 2049–2055 (2020). - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Global monitoring of the impact of the COVID-19 pandemic through online surveys sampled from the Facebook user base

Affiliations

Global monitoring of the impact of the COVID-19 pandemic through online surveys sampled from the Facebook user base

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Research Materials

Miscellaneous