Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea
- PMID: 27377323
- PMCID: PMC4949385
- DOI: 10.2196/jmir.4955
Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea
Abstract
Background: As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions.
Objective: In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea.
Methods: Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics.
Results: In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001).
Conclusions: These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data.
Keywords: Internet search; big data; early response; epidemiology; forecasting; influenza; infodemiology; infoveillance; population surveillance; query; social media; surveillance.
Conflict of interest statement
Conflicts of Interest: None declared.
Figures
References
-
- Cho S, Sohn CH, Jo MW, Shin S, Lee JH, Ryoo SM, Kim WY, Seo D. Correlation between national influenza surveillance data and google trends in South Korea. PLoS One. 2013;8(12):e81422. doi: 10.1371/journal.pone.0081422. http://dx.plos.org/10.1371/journal.pone.0081422 PONE-D-13-24884 - DOI - DOI - PMC - PubMed
-
- Cook S, Conrad C, Fowlkes AL, Mohebbi MH. Assessing Google flu trends performance in the United States during the 2009 influenza virus A (H1N1) pandemic. PLoS One. 2011 Aug;6(8):e23610. doi: 10.1371/journal.pone.0023610. http://dx.plos.org/10.1371/journal.pone.0023610 PONE-D-11-06712 - DOI - DOI - PMC - PubMed
-
- Polgreen PM, Chen Y, Pennock DM, Nelson FD. Using internet searches for influenza surveillance. Clin Infect Dis. 2008 Dec 1;47(11):1443–1448. doi: 10.1086/593098. http://www.cid.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=18954267 - DOI - PubMed
-
- Yuan Q, Nsoesie EO, Lv B, Peng G, Chunara R, Brownstein JS. Monitoring influenza epidemics in china with search query from baidu. PLoS One. 2013 May;8(5):e64323. doi: 10.1371/journal.pone.0064323. http://dx.plos.org/10.1371/journal.pone.0064323 PONE-D-13-00331 - DOI - DOI - PMC - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
Research Materials
Miscellaneous
