Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Nov 24;112(47):14473-8.
doi: 10.1073/pnas.1515373112. Epub 2015 Nov 9.

Accurate estimation of influenza epidemics using Google search data via ARGO

Affiliations

Accurate estimation of influenza epidemics using Google search data via ARGO

Shihao Yang et al. Proc Natl Acad Sci U S A. .

Abstract

Accurate real-time tracking of influenza outbreaks helps public health officials make timely and meaningful decisions that could save lives. We propose an influenza tracking model, ARGO (AutoRegression with GOogle search data), that uses publicly available online search data. In addition to having a rigorous statistical foundation, ARGO outperforms all previously available Google-search-based tracking models, including the latest version of Google Flu Trends, even though it uses only low-quality search data as input from publicly available Google Trends and Google Correlate websites. ARGO not only incorporates the seasonality in influenza epidemics but also captures changes in people's online search behavior over time. ARGO is also flexible, self-correcting, robust, and scalable, making it a potentially powerful tool that can be used for real-time tracking of other social events at multiple temporal and spatial resolutions.

Keywords: autoregressive exogenous model; big data; digital disease detection; influenza-like illnesses activity real-time estimation; seasonal influenza.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Estimation results. (Top) The estimated ILI activity level from ARGO (thick red), contrasting with the true CDC’s ILI activity level (thick black) as well as the estimates from GFT (green), method of ref. (blue), GFT plus AR(3) model (dark yellow), and AR(3) model (dashed gray). The two background shades, white and yellow, reflect two data sources, Google Correlate and Google Trends, respectively. The dash-dotted purple vertical line separates Google Correlate data with search terms identified on March 28, 2009 and May 22,2010. (Middle) The estimation error, defined as estimated value minus the CDC’s ILI activity level. (Bottom) Zoomed-in plots for estimation results in different study periods. (A) The H1N1 flu outbreak period. (B) The 2012–2013 regular flu season. (C) The 2014–2015 regular flu season. A regular flu season is defined as week 40 of one year to week 20 of the following year.
Fig. S1.
Fig. S1.
Dynamic coefficients for ARGO. Red color represents positive coefficients, blue color represents negative coefficients, white color represents zero, and gray color represents missing values. Missing values can be the result of (i) query terms not identified by Google Correlate and (ii) Google Trends data not available for particular query terms. Black horizontal dashed line separates Google query queries from autoregressive lags. Yellow vertical dashed line separates coefficients trained on Google Correlate data from those trained on Google Trends data, and green vertical dashed line separates query terms identified on March 28, 2009 from those identified on May 22, 2010.

References

    1. Ginsberg J, et al. Detecting influenza epidemics using search engine query data. Nature. 2009;457(7232):1012–1014. - PubMed
    1. Polgreen PM, Chen Y, Pennock DM, Nelson FD, Weinstein RA. Using Internet searches for influenza surveillance. Clin Infect Dis. 2008;47(11):1443–1448. - PubMed
    1. Yuan Q, et al. Monitoring influenza epidemics in china with search query from baidu. PLoS One. 2013;8(5):e64323. - PMC - PubMed
    1. Paul MJ, Dredze M, Broniatowski D. 2014. Twitter improves influenza forecasting. PLOS Curr Outbreaks 10.1371/currents.outbreaks.90b9ed0f59bae4ccaa683a39865d9117.
    1. McIver DJ, Brownstein JS. Wikipedia usage estimates prevalence of influenza-like illness in the United States in near real-time. PLOS Comput Biol. 2014;10(4):e1003581. - PMC - PubMed

Publication types

LinkOut - more resources