Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul 30;22(7):e19483.
doi: 10.2196/19483.

Regional Infoveillance of COVID-19 Case Rates: Analysis of Search-Engine Query Patterns

Affiliations

Regional Infoveillance of COVID-19 Case Rates: Analysis of Search-Engine Query Patterns

Henry C Cousins et al. J Med Internet Res. .

Abstract

Background: Timely allocation of medical resources for coronavirus disease (COVID-19) requires early detection of regional outbreaks. Internet browsing data may predict case outbreaks in local populations that are yet to be confirmed.

Objective: We investigated whether search-engine query patterns can help to predict COVID-19 case rates at the state and metropolitan area levels in the United States.

Methods: We used regional confirmed case data from the New York Times and Google Trends results from 50 states and 166 county-based designated market areas (DMA). We identified search terms whose activity precedes and correlates with confirmed case rates at the national level. We used univariate regression to construct a composite explanatory variable based on best-fitting search queries offset by temporal lags. We measured the raw and z-transformed Pearson correlation and root-mean-square error (RMSE) of the explanatory variable with out-of-sample case rate data at the state and DMA levels.

Results: Predictions were highly correlated with confirmed case rates at the state (mean r=0.69, 95% CI 0.51-0.81; median RMSE 1.27, IQR 1.48) and DMA levels (mean r=0.51, 95% CI 0.39-0.61; median RMSE 4.38, IQR 1.80), using search data available up to 10 days prior to confirmed case rates. They fit case-rate activity in 49 of 50 states and in 103 of 166 DMA at a significance level of .05.

Conclusions: Identifiable patterns in search query activity may help to predict emerging regional outbreaks of COVID-19, although they remain vulnerable to stochastic changes in search intensity.

Keywords: COVID-19; Google Trends; epidemiology; infectious disease; infoveillance; internet activity; public health; surveillance.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: Unrelated to this work, AH would like to disclose that he receives remuneration from AdOM for serving as a consultant and a board member, and from Thea for a speaking engagement. AH also holds an ownership interest in AdOM, Luseed, Oxymap, and QuLent. Unrelated to this work, LRP is a consultant to Eyenovia, Bausch+Lomb, Nicox, Emerald Bioscience, and Verily. No relevant financial relationship exists for any of the other authors.

Figures

Figure 1
Figure 1
Correlation of query predictions with regional coronavirus disease (COVID-19) confirmed case rates. (A) Correlation of predicted case rates with actual case rates for the 50 states. Values are Pearson correlation coefficients. * indicates significance at α=.05; ** at α=.01; *** at α=.005. (B) Root-mean-square error (RMSE) between predicted case rates and actual case rates for the 50 states, in units of daily new cases per 100,000 population. (C) Prediction correlations at the state level do not depend on outbreak timing, as measured by the date of the first confirmed case. Circle size indicates the relative population of the state. Color indicates US census-designated region (blue: Northeast; orange: Midwest; gray: South; green: West). (D) Prediction correlations at the designated market area (DMA) level do not depend on outbreak timing, as measured by the date of the first confirmed case. Circle size indicates the relative population of DMA. Color indicates the US census-designated region, as described. n.s.: not significant.
Figure 2
Figure 2
Correlation of query predictions (red) with regional coronavirus disease (COVID-19) case rates (black) at the state and designated market area (DMA) levels, February 20 to April 2, 2020. (A) Comparison of predicted case rates (red) with actual case rates (black) at the state level, with Arizona shown as an example. Dashed lines indicate 95% CIs. (B) Comparison at the DMA level, with the Butte-Bozeman area shown as an example of predictions in a low-population region.

References

    1. Heymann DL, Shindo N. COVID-19: what is next for public health? The Lancet. 2020 Feb;395(10224):542–545. doi: 10.1016/s0140-6736(20)30374-3. - DOI - PMC - PubMed
    1. Gander K. CDC director says there may be another coronavirus wave in late fall and a "substantial portion of Americans" will be susceptible. Newsweek. 2020. Apr 1, [2020-04-06]. https://www.newsweek.com/cdc-director-coronavirus-wave-late-fall-substan....
    1. Bertozzi AL, Franco E, Mohler G, Short MB, Sledge D. The challenges of modeling and forecasting the spread of COVID-19. Proc Natl Acad Sci. 2020 Jul 02; doi: 10.1073/pnas.2006520117. - DOI - PMC - PubMed
    1. Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L. Detecting influenza epidemics using search engine query data. Nature. 2009 Feb 19;457(7232):1012–4. doi: 10.1038/nature07634. - DOI - PubMed
    1. Woo H, Cho Y, Shim E, Lee J, Lee C, Kim SH. Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea. J Med Internet Res. 2016 Jul 04;18(7):e177. doi: 10.2196/jmir.4955. https://www.jmir.org/2016/7/e177/ - DOI - PMC - PubMed

LinkOut - more resources