Twitter improves influenza forecasting

Michael J Paul¹, Mark Dredze², David Broniatowski³

Affiliations

¹ Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA.
² Human Language Technology Center of Excellence, Johns Hopkins University, Baltimore, Maryland, USA.
³ Engineering Management and Systems Engineering, The George Washington University, Washington, District of Columbia, USA.

PMID: 25642377
PMCID: PMC4234396
DOI: 10.1371/currents.outbreaks.90b9ed0f59bae4ccaa683a39865d9117

Twitter improves influenza forecasting

Michael J Paul et al. PLoS Curr. 2014.

. 2014 Oct 28:6:ecurrents.outbreaks.90b9ed0f59bae4ccaa683a39865d9117.

doi: 10.1371/currents.outbreaks.90b9ed0f59bae4ccaa683a39865d9117.

Authors

Michael J Paul¹, Mark Dredze², David Broniatowski³

Affiliations

¹ Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA.
² Human Language Technology Center of Excellence, Johns Hopkins University, Baltimore, Maryland, USA.
³ Engineering Management and Systems Engineering, The George Washington University, Washington, District of Columbia, USA.

PMID: 25642377
PMCID: PMC4234396
DOI: 10.1371/currents.outbreaks.90b9ed0f59bae4ccaa683a39865d9117

Abstract

Accurate disease forecasts are imperative when preparing for influenza epidemic outbreaks; nevertheless, these forecasts are often limited by the time required to collect new, accurate data. In this paper, we show that data from the microblogging community Twitter significantly improves influenza forecasting. Most prior influenza forecast models are tested against historical influenza-like illness (ILI) data from the U.S. Centers for Disease Control and Prevention (CDC). These data are released with a one-week lag and are often initially inaccurate until the CDC revises them weeks later. Since previous studies utilize the final, revised data in evaluation, their evaluations do not properly determine the effectiveness of forecasting. Our experiments using ILI data available at the time of the forecast show that models incorporating data derived from Twitter can reduce forecasting error by 17-30% over a baseline that only uses historical data. For a given level of accuracy, using Twitter data produces forecasts that are two to four weeks ahead of baseline models. Additionally, we find that models using Twitter data are, on average, better predictors of influenza prevalence than are models using data from Google Flu Trends, the leading web data source.

PubMed Disclaimer

Figures

**Nowcasting Errors**
Percent error for three years’ worth of “nowcasts” (forecasts at k=0) using two models: the baseline autoregressive model that uses the previous three weeks of available ILI data (green), and the improved model that adds the Twitter estimate of the current week in addition to the three weeks of ILI values (blue). The vertical lines mark the beginning of a new season. Each season's estimates are based on models trained on the remaining two seasons. The model that includes Twitter data produced better forecasts for 86 out of the 114 weeks shown in the figure.

**Nowcasting Predictions**
Nowcast predictions for three seasons using two models: the baseline autoregressive model (green), and the improved model that includes Twitter (blue). The ground truth ILI values are shown in black.

See this image and copyright information in PMC

References

1. Chretien JP, George D, Shaman J, Chitale RA, McKenzie FE. Influenza forecasting in human populations: a scoping review. PLoS One. 2014;9(4):e94130. PubMed PMID:24714027. - PMC - PubMed
1. Nsoesie E, Mararthe M, Brownstein J. Forecasting peaks of seasonal influenza epidemics. PLoS Curr. 2013 Jun 21;5. PubMed PMID:23873050. - PMC - PubMed
1. Shaman J, Karspeck A, Yang W, Tamerius J, Lipsitch M. Real-time influenza forecasts during the 2012-2013 season. Nat Commun. 2013;4:2837. PubMed PMID:24302074. - PMC - PubMed
1. Soebiyanto RP, Adimi F, Kiang RK. Modeling and predicting seasonal influenza transmission in warm regions using climatological parameters. PLoS One. 2010 Mar 1;5(3):e9450. PubMed PMID:20209164. - PMC - PubMed
1. Culotta, A. Towards detecting influenza epidemics by analyzing Twitter messages. In ACM Workshop on Social Media Analytics. 2010. 10.1145/1964858.1964874 - DOI

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Twitter improves influenza forecasting

Affiliations

Twitter improves influenza forecasting

Authors

Affiliations

Abstract

Figures

References

LinkOut - more resources

Full Text Sources

Other Literature Sources