Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Aug 2;15(8):e1007258.
doi: 10.1371/journal.pcbi.1007258. eCollection 2019 Aug.

Reappraising the utility of Google Flu Trends

Affiliations

Reappraising the utility of Google Flu Trends

Sasikiran Kandula et al. PLoS Comput Biol. .

Abstract

Estimation of influenza-like illness (ILI) using search trends activity was intended to supplement traditional surveillance systems, and was a motivation behind the development of Google Flu Trends (GFT). However, several studies have previously reported large errors in GFT estimates of ILI in the US. Following recent release of time-stamped surveillance data, which better reflects real-time operational scenarios, we reanalyzed GFT errors. Using three data sources-GFT: an archive of weekly ILI estimates from Google Flu Trends; ILIf: fully-observed ILI rates from ILINet; and, ILIp: ILI rates available in real-time based on partial reporting-five influenza seasons were analyzed and mean square errors (MSE) of GFT and ILIp as estimates of ILIf were computed. To correct GFT errors, a random forest regression model was built with ILI and GFT rates from the previous three weeks as predictors. An overall reduction in error of 44% was observed and the errors of the corrected GFT are lower than those of ILIp. An 80% reduction in error during 2012/13, when GFT had large errors, shows that extreme failures of GFT could have been avoided. Using autoregressive integrated moving average (ARIMA) models, one- to four-week ahead forecasts were generated with two separate data streams: ILIp alone, and with both ILIp and corrected GFT. At all forecast targets and seasons, and for all but two regions, inclusion of GFT lowered MSE. Results from two alternative error measures, mean absolute error and mean absolute proportional error, were largely consistent with results from MSE. Taken together these findings provide an error profile of GFT in the US, establish strong evidence for the adoption of search trends based 'nowcasts' in influenza forecast systems, and encourage reevaluation of the utility of this data source in diverse domains.

PubMed Disclaimer

Conflict of interest statement

JS and Columbia University declare partial ownership in SK Analytics. SK was a consultant to SK Analytics.

Figures

Fig 1
Fig 1. Availability of GFT, ILIf, and ILIp at US national, regional and state levels in the US.
At the regional level, GFT and ILIf were available from 2003, and ILIp were available from 2009/10 season onwards, excluding off-season weeks. For states, ILIp were never available to the public, and ILIf is available from 2011/12 season onwards. Updates to GFT model are indicated by the vertical lines.
Fig 2
Fig 2. Squared errors from GFT and ILIp for HHS regions during the 2014/15 season.
The green data points show the error during the week of maximum weekly ILIf—the peak week—and the remaining data points are color coded by their distance from peak week. The black triangles show the mean error for the season. The black line is the y = x line; points below this line have larger errors from GFT than from ILIp. In all regions, the mean error from GFT falls below the line.
Fig 3
Fig 3. Mean squared error of GFT observed in US states.
The top left panel, Overall, shows average errors across 5 seasons and each of the other panels is limited to one season. The data points are color coded by population size and ordered by overall error (high to low). The black line shows the errors from corresponding HHS regions.
Fig 4
Fig 4. Mean squared error of near term forecasts for ILIp and ILIp+GFT models.
The data points are color coded by target. Points below the diagonal (broken black line) indicate instances where forecast quality improved with the use of GFT. Each panel is for one of the locations.

Similar articles

Cited by

References

    1. Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L. Detecting influenza epidemics using search engine query data. Nature. 2009;457(7232):1012–4. 10.1038/nature07634 - DOI - PubMed
    1. Dredze M, Cheng R, Paul MJ, Broniatowski D, editors. HealthTweets. org: a platform for public health surveillance using Twitter. AAAI Workshop on the World Wide Web and Public Health Intelligence; 2014.
    1. Farrow D. Modeling the Past, Present, and Future of Influenza [Doctoral dissertation]: Carnegie Mellon University; 2016.
    1. Santillana M, Nguyen AT, Dredze M, Paul MJ, Nsoesie EO, Brownstein JS. Combining search, social media, and traditional data sources to improve influenza surveillance. PLoS Comput Biol. 2015;11(10):e1004513 10.1371/journal.pcbi.1004513 - DOI - PMC - PubMed
    1. Kandula S, Hsu D, Shaman J. Subregional nowcasts of seasonal influenza using search trends. Journal of Medical Internet Research. 2017;19(11). - PMC - PubMed

Publication types