Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Sep 18;8(1):13963.
doi: 10.1038/s41598-018-32029-6.

The added value of online user-generated content in traditional methods for influenza surveillance

Affiliations

The added value of online user-generated content in traditional methods for influenza surveillance

Moritz Wagner et al. Sci Rep. .

Abstract

There has been considerable work in evaluating the efficacy of using online data for health surveillance. Often comparisons with baseline data involve various squared error and correlation metrics. While useful, these overlook a variety of other factors important to public health bodies considering the adoption of such methods. In this paper, a proposed surveillance system that incorporates models based on recent research efforts is evaluated in terms of its added value for influenza surveillance at Public Health England. The system comprises of two supervised learning approaches trained on influenza-like illness (ILI) rates provided by the Royal College of General Practitioners (RCGP) and produces ILI estimates using Twitter posts or Google search queries. RCGP ILI rates for different age groups and laboratory confirmed cases by influenza type are used to evaluate the models with a particular focus on predicting the onset, overall intensity, peak activity and duration of the 2015/16 influenza season. We show that the Twitter-based models perform poorly and hypothesise that this is mostly due to the sparsity of the data available and a limited training period. Conversely, the Google-based model provides accurate estimates with timeliness of approximately one week and has the potential to complement current surveillance systems.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
RCGP ILI estimates with overlaid national Twitter (blue) and Google (green) supervised model ILI estimates by week number during the 2015/16 influenza season. Thresholds were calculated using the Moving Epidemic Method based on national RCGP ILI estimates of the previous 6 influenza seasons.
Figure 2
Figure 2
Absolute errors between RCGP data and the national Twitter (blue) and Google (green) supervised model ILI estimates by week number during the 2015/16 influenza season including their 3 day moving averages.

References

    1. Gibbons, C. L. et al. Measuring underreporting and under-ascertainment in infectious disease datasets: a comparison of methods. BMC Public Heal. 14, 10.1186/1471-2458-14-147 (2014). - PMC - PubMed
    1. Eysenbach GI. and Infoveillance: Framework for an Emerging Set of Public Health Informatics Methods to Analyze Search, Communication and Publication Behavior on the Internet. J. Med. Internet Res. 2009;11:e11. doi: 10.2196/jmir.1157. - DOI - PMC - PubMed
    1. Althouse, B. M. et al. Enhancing disease surveillance with novel data streams: challenges and opportunities. EPJ Data Sci. 4, 10.1140/epjds/s13688-015-0054-0 (2015). - PMC - PubMed
    1. Simonsen L, Gog JR, Olson D, Viboud C. Infectious Disease Surveillance in the Big Data Era: Towards Faster and Locally Relevant Systems. J. Infect. Dis. 2016;214:S380–S385. doi: 10.1093/infdis/jiw376. - DOI - PMC - PubMed
    1. Culotta, A. Towards detecting influenza epidemics by analyzing Twitter messages. In Proceedings of the First Workshop on Social Media Analytics, 10.1145/1964858.1964874 (ACM, 2010).

Publication types