Regional Infoveillance of COVID-19 Case Rates: Analysis of Search-Engine Query Patterns
- PMID: 32692691
- PMCID: PMC7394521
- DOI: 10.2196/19483
Regional Infoveillance of COVID-19 Case Rates: Analysis of Search-Engine Query Patterns
Abstract
Background: Timely allocation of medical resources for coronavirus disease (COVID-19) requires early detection of regional outbreaks. Internet browsing data may predict case outbreaks in local populations that are yet to be confirmed.
Objective: We investigated whether search-engine query patterns can help to predict COVID-19 case rates at the state and metropolitan area levels in the United States.
Methods: We used regional confirmed case data from the New York Times and Google Trends results from 50 states and 166 county-based designated market areas (DMA). We identified search terms whose activity precedes and correlates with confirmed case rates at the national level. We used univariate regression to construct a composite explanatory variable based on best-fitting search queries offset by temporal lags. We measured the raw and z-transformed Pearson correlation and root-mean-square error (RMSE) of the explanatory variable with out-of-sample case rate data at the state and DMA levels.
Results: Predictions were highly correlated with confirmed case rates at the state (mean r=0.69, 95% CI 0.51-0.81; median RMSE 1.27, IQR 1.48) and DMA levels (mean r=0.51, 95% CI 0.39-0.61; median RMSE 4.38, IQR 1.80), using search data available up to 10 days prior to confirmed case rates. They fit case-rate activity in 49 of 50 states and in 103 of 166 DMA at a significance level of .05.
Conclusions: Identifiable patterns in search query activity may help to predict emerging regional outbreaks of COVID-19, although they remain vulnerable to stochastic changes in search intensity.
Keywords: COVID-19; Google Trends; epidemiology; infectious disease; infoveillance; internet activity; public health; surveillance.
©Henry C Cousins, Clara C Cousins, Alon Harris, Louis R Pasquale. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 30.07.2020.
Conflict of interest statement
Conflicts of Interest: Unrelated to this work, AH would like to disclose that he receives remuneration from AdOM for serving as a consultant and a board member, and from Thea for a speaking engagement. AH also holds an ownership interest in AdOM, Luseed, Oxymap, and QuLent. Unrelated to this work, LRP is a consultant to Eyenovia, Bausch+Lomb, Nicox, Emerald Bioscience, and Verily. No relevant financial relationship exists for any of the other authors.
Figures
References
-
- Gander K. CDC director says there may be another coronavirus wave in late fall and a "substantial portion of Americans" will be susceptible. Newsweek. 2020. Apr 1, [2020-04-06]. https://www.newsweek.com/cdc-director-coronavirus-wave-late-fall-substan....
-
- Woo H, Cho Y, Shim E, Lee J, Lee C, Kim SH. Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea. J Med Internet Res. 2016 Jul 04;18(7):e177. doi: 10.2196/jmir.4955. https://www.jmir.org/2016/7/e177/ - DOI - PMC - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
