Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 21;9(10):250.
doi: 10.3390/tropicalmed9100250.

Leveraging Climate Data for Dengue Forecasting in Ba Ria Vung Tau Province, Vietnam: An Advanced Machine Learning Approach

Affiliations

Leveraging Climate Data for Dengue Forecasting in Ba Ria Vung Tau Province, Vietnam: An Advanced Machine Learning Approach

Dang Anh Tuan et al. Trop Med Infect Dis. .

Erratum in

Abstract

Dengue fever is a persistent public health issue in tropical regions, including Vietnam, where climate variability plays a crucial role in disease transmission dynamics. This study focuses on developing climate-based machine learning models to forecast dengue outbreaks in Ba Ria Vung Tau (BRVT) province, Vietnam, using meteorological data from 2003 to 2022. We utilized four predictive models-Negative Binomial Regression (NBR), Seasonal AutoRegressive Integrated Moving Average with Exogenous Regressors (SARIMAX), Extreme Gradient Boosting (XGBoost) v2.0.3, and long short-term memory (LSTM)-to predict weekly dengue incidence. Key climate variables, including temperature, humidity, precipitation, and wind speed, were integrated into these models, with lagged variables included to capture delayed climatic effects on dengue transmission. The NBR model demonstrated the best performance in terms of predictive accuracy, achieving the lowest Mean Absolute Error (MAE), compared to other models. The inclusion of lagged climate variables significantly enhanced the model's ability to predict dengue cases. Although effective in capturing seasonal trends, SARIMAX and LSTM models struggled with overfitting and failed to accurately predict short-term outbreaks. XGBoost exhibited moderate predictive power but was sensitive to overfitting, particularly without fine-tuning. Our findings confirm that climate-based machine learning models, particularly the NBR model, offer valuable tools for forecasting dengue outbreaks in BRVT. However, improving the models' ability to predict short-term peaks remains a challenge. The integration of meteorological data into early warning systems is crucial for public health authorities to plan timely and effective interventions. This research contributes to the growing body of literature on climate-based disease forecasting and underscores the need for further model refinement to address the complexities of dengue transmission in highly endemic regions.

Keywords: Ba Ria Vung Tau; LSTM; SARIMAX; Vietnam; XGBoost; climate forecasting; dengue fever; machine learning; negative binomial regression.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 11
Figure 11
Loss convergence on training and test sets for LSTM: (A) Model #1; (B) Model #2; (C) Model #3.
Figure 12
Figure 12
Observed and predicted dengue cases in training (week 1 of 2003 to week 17 of 2017) and test data (week 18 of 2017 to week 53 of 2022) of different models: (A) NBR #1; (B) NBR #2; (C) NBR #3; (D) NBR #4; (E) SARIMAX #1; (F) SARIMAX #2; (G) XGBoost #1; (H) XGBoost #2; (I) XGBoost #2 (refit); (J) LSTM #1; (K) LSTM #2; (L) LSTM #3.
Figure 12
Figure 12
Observed and predicted dengue cases in training (week 1 of 2003 to week 17 of 2017) and test data (week 18 of 2017 to week 53 of 2022) of different models: (A) NBR #1; (B) NBR #2; (C) NBR #3; (D) NBR #4; (E) SARIMAX #1; (F) SARIMAX #2; (G) XGBoost #1; (H) XGBoost #2; (I) XGBoost #2 (refit); (J) LSTM #1; (K) LSTM #2; (L) LSTM #3.
Figure 12
Figure 12
Observed and predicted dengue cases in training (week 1 of 2003 to week 17 of 2017) and test data (week 18 of 2017 to week 53 of 2022) of different models: (A) NBR #1; (B) NBR #2; (C) NBR #3; (D) NBR #4; (E) SARIMAX #1; (F) SARIMAX #2; (G) XGBoost #1; (H) XGBoost #2; (I) XGBoost #2 (refit); (J) LSTM #1; (K) LSTM #2; (L) LSTM #3.
Figure 12
Figure 12
Observed and predicted dengue cases in training (week 1 of 2003 to week 17 of 2017) and test data (week 18 of 2017 to week 53 of 2022) of different models: (A) NBR #1; (B) NBR #2; (C) NBR #3; (D) NBR #4; (E) SARIMAX #1; (F) SARIMAX #2; (G) XGBoost #1; (H) XGBoost #2; (I) XGBoost #2 (refit); (J) LSTM #1; (K) LSTM #2; (L) LSTM #3.
Figure 12
Figure 12
Observed and predicted dengue cases in training (week 1 of 2003 to week 17 of 2017) and test data (week 18 of 2017 to week 53 of 2022) of different models: (A) NBR #1; (B) NBR #2; (C) NBR #3; (D) NBR #4; (E) SARIMAX #1; (F) SARIMAX #2; (G) XGBoost #1; (H) XGBoost #2; (I) XGBoost #2 (refit); (J) LSTM #1; (K) LSTM #2; (L) LSTM #3.
Figure 1
Figure 1
Overall process of DF case prediction study.
Figure 2
Figure 2
Annual distribution of dengue cases in Ba Ria Vung Tau, Vietnam (2003–2022).
Figure 3
Figure 3
Time series plots of weekly climate variables and dengue cases from week 1—2003 to week 53—2022.
Figure 3
Figure 3
Time series plots of weekly climate variables and dengue cases from week 1—2003 to week 53—2022.
Figure 4
Figure 4
Correlation matrix of climatic variables.
Figure 5
Figure 5
Correlation of climatic variables (features) with total cases. The bar chart illustrates the correlation coefficients of various climatic variables with the total number of cases. The colors represent the direction and strength of the correlation: shades of blue indicate negative correlations, while shades of red indicate positive correlations. Darker colors reflect stronger correlations. This visual distinction helps in understanding how each variable is associated with the total number of cases, with the length of the bars representing the magnitude of the correlation.
Figure 6
Figure 6
Relative importance of features in predicting dengue cases.
Figure 7
Figure 7
Time series of monthly dengue cases from the training set.
Figure 8
Figure 8
ACF (top) and PACF (bottom) plots. The blue shaded area in both plots represents the 95% confidence interval. Autocorrelation (or partial autocorrelation) values falling within the shaded area are not statistically significant, indicating that they may occur by chance. Values outside this region suggest statistically significant correlations at those lags.
Figure 9
Figure 9
Train–test data split for all models in this study.
Figure 10
Figure 10
Relative feature importance in predicting dengue cases for XGB Regression Model #2.

References

    1. Chen J., Ding R.-L., Liu K.-K., Xiao H., Hu G., Xiao X., Yue Q., Lu J.-H., Han Y., Bu J. Collaboration between meteorology and public health: Predicting the dengue epidemic in Guangzhou, China, by meteorological parameters. Front. Cell. Infect. Microbiol. 2022;12:881745. doi: 10.3389/fcimb.2022.881745. - DOI - PMC - PubMed
    1. Colón-González F.J., Soares Bastos L., Hofmann B., Hopkin A., Harpham Q., Crocker T., Amato R., Ferrario I., Moschini F., James S. Probabilistic seasonal dengue forecasting in Vietnam: A modelling study using superensembles. PLoS Med. 2021;18:e1003542. doi: 10.1371/journal.pmed.1003542. - DOI - PMC - PubMed
    1. Akter R., Hu W., Gatton M., Bambrick H., Cheng J., Tong S. Climate variability, socio-ecological factors and dengue transmission in tropical Queensland, Australia: A Bayesian spatial analysis. Environ. Res. 2021;195:110285. doi: 10.1016/j.envres.2020.110285. - DOI - PubMed
    1. Xu J., Xu K., Li Z., Meng F., Tu T., Xu L., Liu Q. Forecast of dengue cases in 20 Chinese cities based on the deep learning method. Int. J. Environ. Res. Public Health. 2020;17:453. doi: 10.3390/ijerph17020453. - DOI - PMC - PubMed
    1. McGough S.F., Clemente L., Kutz J.N., Santillana M. A dynamic, ensemble learning approach to forecast dengue fever epidemic years in Brazil using weather and population susceptibility cycles. J. R. Soc. Interface. 2021;18:20201006. doi: 10.1098/rsif.2020.1006. - DOI - PMC - PubMed

LinkOut - more resources