Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug 11;23(8):e28876.
doi: 10.2196/28876.

Forecasting the COVID-19 Epidemic by Integrating Symptom Search Behavior Into Predictive Models: Infoveillance Study

Affiliations

Forecasting the COVID-19 Epidemic by Integrating Symptom Search Behavior Into Predictive Models: Infoveillance Study

Alessandro Rabiolo et al. J Med Internet Res. .

Abstract

Background: Previous studies have suggested associations between trends of web searches and COVID-19 traditional metrics. It remains unclear whether models incorporating trends of digital searches lead to better predictions.

Objective: The aim of this study is to investigate the relationship between Google Trends searches of symptoms associated with COVID-19 and confirmed COVID-19 cases and deaths. We aim to develop predictive models to forecast the COVID-19 epidemic based on a combination of Google Trends searches of symptoms and conventional COVID-19 metrics.

Methods: An open-access web application was developed to evaluate Google Trends and traditional COVID-19 metrics via an interactive framework based on principal component analysis (PCA) and time series modeling. The application facilitates the analysis of symptom search behavior associated with COVID-19 disease in 188 countries. In this study, we selected the data of nine countries as case studies to represent all continents. PCA was used to perform data dimensionality reduction, and three different time series models (error, trend, seasonality; autoregressive integrated moving average; and feed-forward neural network autoregression) were used to predict COVID-19 metrics in the upcoming 14 days. The models were compared in terms of prediction ability using the root mean square error (RMSE) of the first principal component (PC1). The predictive abilities of models generated with both Google Trends data and conventional COVID-19 metrics were compared with those fitted with conventional COVID-19 metrics only.

Results: The degree of correlation and the best time lag varied as a function of the selected country and topic searched; in general, the optimal time lag was within 15 days. Overall, predictions of PC1 based on both search terms and COVID-19 traditional metrics performed better than those not including Google searches (median 1.56, IQR 0.90-2.49 versus median 1.87, IQR 1.09-2.95, respectively), but the improvement in prediction varied as a function of the selected country and time frame. The best model varied as a function of country, time range, and period of time selected. Models based on a 7-day moving average led to considerably smaller RMSE values as opposed to those calculated with raw data (median 0.90, IQR 0.50-1.53 versus median 2.27, IQR 1.62-3.74, respectively).

Conclusions: The inclusion of digital online searches in statistical models may improve the nowcasting and forecasting of the COVID-19 epidemic and could be used as one of the surveillance systems of COVID-19 disease. We provide a free web application operating with nearly real-time data that anyone can use to make predictions of outbreaks, improve estimates of the dynamics of ongoing epidemics, and predict future or rebound waves.

Keywords: COVID-19; Google Trends; SARS-CoV-2; Shiny web application; big data; coronavirus; digital health; infodemiology; infoveillance; predictive models; symptoms; time series.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: AIM reports personal fees and consultancy fees from Allergan, Pfizer, Alcon, Novartis, Zeiss, Easyscan, and Visufarma, outside the submitted work. FB reports consultancy fees from Allergan, Bayer, Boehringer-Ingelheim, Fidia Sooft, Hofmann La Roche, Novartis, NTC Pharma, Sifi, Thrombogenics, and Zeiss. All other authors declare no competing interests.

Figures

Figure 1
Figure 1
Cumulative number of confirmed cases (A) and deaths (B) per million for each country over time.
Figure 2
Figure 2
Streamgraphs of the interest-over-time index for each individual country. The x-axis values are given in months. Index-over-time values were plotted as 7-day moving average.
Figure 3
Figure 3
Root mean square errors of the prediction error for principal component 1 of the various models for the selected countries. MA1 and MA7 indicate analyses performed on 1-day (ie, original data) and 7-day moving averages of data, respectively. GT indicates models based on both traditional COVID-19 metrics and Google Trends data, while NOGT models are based on COVID-19 metrics only. ARIMA: autoregressive integrated moving average; ETS: error, trend, seasonality; NNAR: feed-forward neural network autoregression; RMSE: root mean square error; UK: United Kingdom; USA: United States of America.

References

    1. Roda WC, Varughese MB, Han D, Li MY. Why is it difficult to accurately predict the COVID-19 epidemic? Infect Dis Model. 2020;5:271–281. doi: 10.1016/j.idm.2020.03.001. https://linkinghub.elsevier.com/retrieve/pii/S2468-0427(20)30007-5 - DOI - PMC - PubMed
    1. Fu L, Wang B, Yuan T, Chen X, Ao Y, Fitzpatrick T, Li P, Zhou Y, Lin Y, Duan Q, Luo G, Fan S, Lu Y, Feng A, Zhan Y, Liang B, Cai W, Zhang L, Du X, Li L, Shu Y, Zou H. Clinical characteristics of coronavirus disease 2019 (COVID-19) in China: A systematic review and meta-analysis. J Infect. 2020 Jun;80(6):656–665. doi: 10.1016/j.jinf.2020.03.041. http://europepmc.org/abstract/MED/32283155 - DOI - PMC - PubMed
    1. Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L. Detecting influenza epidemics using search engine query data. Nature. 2009 Feb 19;457(7232):1012–4. doi: 10.1038/nature07634. - DOI - PubMed
    1. Walker A, Hopkins C, Surda P. Use of Google Trends to investigate loss-of-smell-related searches during the COVID-19 outbreak. Int Forum Allergy Rhinol. 2020 Jul;10(7):839–847. doi: 10.1002/alr.22580. http://europepmc.org/abstract/MED/32279437 - DOI - PMC - PubMed
    1. Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis. 2020 May;20(5):533–534. doi: 10.1016/S1473-3099(20)30120-1. http://europepmc.org/abstract/MED/32087114 - DOI - PMC - PubMed