Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 16;14(10):e0008710.
doi: 10.1371/journal.pntd.0008710. eCollection 2020 Oct.

Weekly dengue forecasts in Iquitos, Peru; San Juan, Puerto Rico; and Singapore

Affiliations

Weekly dengue forecasts in Iquitos, Peru; San Juan, Puerto Rico; and Singapore

Corey M Benedum et al. PLoS Negl Trop Dis. .

Abstract

Background: Predictive models can serve as early warning systems and can be used to forecast future risk of various infectious diseases. Conventionally, regression and time series models are used to forecast dengue incidence, using dengue surveillance (e.g., case counts) and weather data. However, these models may be limited in terms of model assumptions and the number of predictors that can be included. Machine learning (ML) methods are designed to work with a large number of predictors and thus offer an appealing alternative. Here, we compared the performance of ML algorithms with that of regression models in predicting dengue cases and outbreaks from 4 to up to 12 weeks in advance. Many countries lack sufficient health surveillance infrastructure, as such we evaluated the contribution of dengue surveillance and weather data on the predictive power of these models.

Methods: We developed ML, regression, and time series models to forecast weekly dengue case counts and outbreaks in Iquitos, Peru; San Juan, Puerto Rico; and Singapore from 1990-2016. Forecasts were generated using available weekly dengue surveillance, and weather data. We evaluated the agreement between model forecasts and actual dengue observations using Mean Absolute Error and Matthew's Correlation Coefficient (MCC).

Results: For near term predictions of weekly case counts and when using surveillance data, ML models had 21% and 33% less error than regression and time series models respectively. However, using weather data only, ML models did not demonstrate a practical advantage. When forecasting weekly dengue outbreaks 12 weeks in advance, ML models achieved a maximum MCC of 0.61.

Conclusions: Our results identified 2 scenarios when ML models are advantageous over regression model: 1) predicting dengue weekly case counts 4 weeks ahead when dengue surveillance data are available and 2) predicting weekly dengue outbreaks 12 weeks ahead when dengue surveillance data are unavailable. Given the advantages of ML models, dengue early warning systems may be improved by the inclusion of these models.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. General framework to develop RF and regression prediction models.
To assess how each model’s predictive accuracy was affected by the lack of current dengue surveillance data, we trained models to predict dengue case counts and outbreaks using only population, temporal, and weather predictor variables. We compared the performance of these models with the performance of the same models when surveillance data inputs were included.
Fig 2
Fig 2. Weekly observations of reported dengue cases by study area.
In this figure, left-hand panels (red curves) represent training data, while right-hand panels (blue curves) represent the testing data.
Fig 3
Fig 3. 4 week forecast accuracy of the temporal pattern of dengue case counts, Iquitos, Peru, June 2009 –June 2013.
Observed weekly cases counts (black area) are compared with 4 week ahead forecasts made by Random Forest and Poisson regression models. Dotted lines represent 95% confidence intervals around the model’s prediction. RF model standard errors were estimated using the infinitesimal jackknife for bagging approach [101].
Fig 4
Fig 4. 4 week forecast accuracy of the temporal pattern of dengue case counts, San Juan, Puerto Rico, April 2009 –April 2013.
Observed weekly cases counts (black area) are compared with 4 week ahead forecasts made by Random Forest and Poisson regression models. Dotted lines represent 95% confidence intervals around the model’s prediction. RF model standard errors were estimated using the infinitesimal jackknife for bagging approach [101].
Fig 5
Fig 5. 4 week forecast accuracy of the temporal pattern of dengue case counts, Singapore, January 2013 –December 2016.
Observed weekly cases counts (black area) are compared with 4 week ahead forecasts made by Random Forest and Poisson regression models. Dotted lines represent 95% confidence intervals around the model’s prediction. RF model standard errors were estimated using the infinitesimal jackknife for bagging approach [101].
Fig 6
Fig 6. Top 10 most important predictors for the Random Forest model when predicting weekly dengue case counts, Iquitos, Peru.
The 10 most important predictors to the Random Forest model prior to variable reduction. Predictor importance was quantified as the percentage increase in mean squared error. Red bars indicate the model included surveillance data inputs while blue bars indicate the model did not include surveillance data inputs. Predictors are shown for forecasts made 4 (A) and 12 (B) weeks in advance.
Fig 7
Fig 7. Top 10 most important predictors for the Random Forest model when predicting weekly dengue case counts, San Juan, Puerto Rico.
The 10 most important predictors to the Random Forest model prior to variable reduction. Predictor importance was quantified as the percentage increase in mean squared error. Red bars indicate the model included surveillance data inputs while blue bars indicate the model did not include surveillance data inputs. Predictors are shown for forecasts made 4 (A) and 12 (B) weeks in advance.
Fig 8
Fig 8. Top 10 most important predictors for the Random Forest model when predicting weekly dengue case counts, Singapore.
The 10 most important predictors to the Random Forest model prior to variable reduction. Predictor importance was quantified as the percentage increase in mean squared error. Red bars indicate the model included surveillance data inputs while blue bars indicate the model did not include surveillance data inputs. Predictors are shown for forecasts made 4 (A) and 12 (B) weeks in advance.
Fig 9
Fig 9. RF-UFA forecast accuracy of the temporal pattern of dengue outbreaks, Iquitos, Peru, June 2009–June 2013.
The number of high-risk (red) and low-risk (blue) flags per week that are met 12 weeks in advance are plotted against weekly dengue case counts (black) in the testing data. Grey regions represent observed outbreak weeks. Thresholds were identified using UFA and are associated with dengue outbreaks 12 weeks into the future. Black dashed lines indicate the beginning of a new test set.
Fig 10
Fig 10. RF-UFA forecast accuracy of the temporal pattern of dengue outbreaks, San Juan, Puerto Rico, April 2009–April 2013.
The number of high-risk (red) and low-risk (blue) flags per week that are met 12 weeks in advance are plotted against weekly dengue case counts (black) in the testing data. Grey regions represent observed outbreak weeks. Thresholds were identified using UFA and are associated with dengue outbreaks 12 weeks into the future. Black dashed lines indicate the beginning of a new test set.
Fig 11
Fig 11. RF-UFA forecast accuracy of the temporal pattern of dengue outbreaks, Singapore, January 2013–December 2016.
The number of high-risk (red) and low-risk (blue) flags per week that are met 12 weeks in advance are plotted against weekly dengue case counts (black) in the testing data. Grey regions represent observed outbreak weeks. Thresholds were identified using UFA and are associated with dengue outbreaks 12 weeks into the future. Black dashed lines indicate the beginning of a new test set.

Similar articles

Cited by

References

    1. Rezza G. Aedes albopictus and the reemergence of Dengue. BMC Public Health. 2012;12: 72 10.1186/1471-2458-12-72 - DOI - PMC - PubMed
    1. Bhatt S, Gething PW, Brady OJ, Messina JP, Farlow AW, Moyes CL, et al. The global distribution and burden of dengue. Nature. 2013;496: 504–507. 10.1038/nature12060 - DOI - PMC - PubMed
    1. Beatty ME, Letson W, Edgil DM, Margolis HS. Estimating the total world population at risk for locally acquired dengue infection. American Journal of Tropical Medicine and Hygiene. 2007. pp. 221–221. - PubMed
    1. Hales S, De Wet N, Maindonald J, Woodward A. Potential effect of population and climate changes on global distribution of dengue fever: an empirical model. The Lancet. 2002;360: 830–834. - PubMed
    1. Hii YL, Zhu H, Ng N, Ng LC, Rocklöv J. Forecast of dengue incidence using temperature and rainfall. PLoS Negl Trop Dis. 2012;6: e1908 10.1371/journal.pntd.0001908 - DOI - PMC - PubMed

Publication types