Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jul 4;18(7):e177.
doi: 10.2196/jmir.4955.

Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea

Affiliations

Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea

Hyekyung Woo et al. J Med Internet Res. .

Abstract

Background: As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions.

Objective: In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea.

Methods: Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics.

Results: In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001).

Conclusions: These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data.

Keywords: Internet search; big data; early response; epidemiology; forecasting; influenza; infodemiology; infoveillance; population surveillance; query; social media; surveillance.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
Support vector machine for regression(SVR) prediction and error for influenza-like illness (ILI) surveillance in Korea. This figure shows the performance of the SVR model using the validation set of KCDC surveillance data to predict the next observation. Note: log error=log([obs–exp]2/abs[exp]).
Figure 2
Figure 2
Support vector machine for regression (SVR) prediction and error for virological surveillance in Korea. This figure shows the performance of the SVR model using the validation set of KCDC surveillance data to predict the next observation. Note: log error=log([obs–exp]2/abs[exp]); VIR: virological positive rate.

References

    1. Cho S, Sohn CH, Jo MW, Shin S, Lee JH, Ryoo SM, Kim WY, Seo D. Correlation between national influenza surveillance data and google trends in South Korea. PLoS One. 2013;8(12):e81422. doi: 10.1371/journal.pone.0081422. http://dx.plos.org/10.1371/journal.pone.0081422 PONE-D-13-24884 - DOI - DOI - PMC - PubMed
    1. Cook S, Conrad C, Fowlkes AL, Mohebbi MH. Assessing Google flu trends performance in the United States during the 2009 influenza virus A (H1N1) pandemic. PLoS One. 2011 Aug;6(8):e23610. doi: 10.1371/journal.pone.0023610. http://dx.plos.org/10.1371/journal.pone.0023610 PONE-D-11-06712 - DOI - DOI - PMC - PubMed
    1. Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L. Detecting influenza epidemics using search engine query data. Nature. 2009 Feb 19;457(7232):1012–1014. doi: 10.1038/nature07634.nature07634 - DOI - PubMed
    1. Polgreen PM, Chen Y, Pennock DM, Nelson FD. Using internet searches for influenza surveillance. Clin Infect Dis. 2008 Dec 1;47(11):1443–1448. doi: 10.1086/593098. http://www.cid.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=18954267 - DOI - PubMed
    1. Yuan Q, Nsoesie EO, Lv B, Peng G, Chunara R, Brownstein JS. Monitoring influenza epidemics in china with search query from baidu. PLoS One. 2013 May;8(5):e64323. doi: 10.1371/journal.pone.0064323. http://dx.plos.org/10.1371/journal.pone.0064323 PONE-D-13-00331 - DOI - DOI - PMC - PubMed