Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Aug;5(8):e1258.
doi: 10.1371/journal.pntd.0001258. Epub 2011 Aug 2.

Prediction of dengue incidence using search query surveillance

Affiliations

Prediction of dengue incidence using search query surveillance

Benjamin M Althouse et al. PLoS Negl Trop Dis. 2011 Aug.

Abstract

Background: The use of internet search data has been demonstrated to be effective at predicting influenza incidence. This approach may be more successful for dengue which has large variation in annual incidence and a more distinctive clinical presentation and mode of transmission.

Methods: We gathered freely-available dengue incidence data from Singapore (weekly incidence, 2004-2011) and Bangkok (monthly incidence, 2004-2011). Internet search data for the same period were downloaded from Google Insights for Search. Search terms were chosen to reflect three categories of dengue-related search: nomenclature, signs/symptoms, and treatment. We compared three models to predict incidence: a step-down linear regression, generalized boosted regression, and negative binomial regression. Logistic regression and Support Vector Machine (SVM) models were used to predict a binary outcome defined by whether dengue incidence exceeded a chosen threshold. Incidence prediction models were assessed using r² and Pearson correlation between predicted and observed dengue incidence. Logistic and SVM model performance were assessed by the area under the receiver operating characteristic curve. Models were validated using multiple cross-validation techniques.

Results: The linear model selected by AIC step-down was found to be superior to other models considered. In Bangkok, the model has an r² = 0.943, and a correlation of 0.869 between fitted and observed. In Singapore, the model has an r² = 0.948, and a correlation of 0.931. In both Singapore and Bangkok, SVM models outperformed logistic regression in predicting periods of high incidence. The AUC for the SVM models using the 75th percentile cutoff is 0.906 in Singapore and 0.960 in Bangkok.

Conclusions: Internet search terms predict incidence and periods of large incidence of dengue with high accuracy and may prove useful in areas with underdeveloped surveillance systems. The methods presented here use freely available data and analysis tools and can be readily adapted to other settings.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Schematic of step-down search term selection.
Figure shows the search terms used in the full models for Singapore and Bangkok (top boxes), as well as the results of the AIC step-down procedure (bottom boxes).
Figure 2
Figure 2. Correlation between observed dengue incidence and model fit.
Figure displays observed dengue case incidence for years 2005–2011 (black lines) as well as the model fitted to data from 2005–2010 (solid red lines) and 2010 prediction with 95% prediction intervals (dashed red lines and solid red band) for both Singapore (panel A) and Bangkok data (panel C). Panels B and D show the error between the observed incidence and model fit or prediction.
Figure 3
Figure 3. Summary of SVM prediction in Singapore.
The performance of the SVM model in Singapore. Red circles indicate a prediction of high incidence at the optimal probability found from the ROC curve at right. Black stars indicate observed high incidence not predicted by the model. Panel A and the corresponding ROC curve at right indicate the median cutoff, panel B the 75th percentile cutoff and panel C the 90th percentile cutoff.

References

    1. Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, et al. Detecting inuenza epidemics using search engine query data. Nature. 2009;457:1012–4. - PubMed
    1. Johnson HA, Wagner MM, Hogan WR, Chapman W, Olszewski RT, et al. Analysis of web access logs for surveillance of inuenza. Stud Health Technol Inform. 2004;107:1202–6. - PubMed
    1. Eysenbach G. AMIA Annu Symp Proc; 2006. Infodemiology: tracking u-related searches on the web for syndromic surveillance. pp. 244–8. - PMC - PubMed
    1. Polgreen PM, Chen Y, Pennock DM, Nelson FD. Using internet searches for inuenza surveillance. Clinical Infectious Diseases. 2008;47:1443–1448. - PubMed
    1. Hulth A, Rydevik G, Linde A. Web queries as a source for syndromic surveillance. Plos One. 2009;4 - PMC - PubMed

Publication types