Machine learning for predicting severe dengue in Puerto Rico

doi:10.1186/s40249-025-01273-0

. 2025 Feb 4;14(1):5.

doi: 10.1186/s40249-025-01273-0.

Machine learning for predicting severe dengue in Puerto Rico

Zachary J Madewell¹, Dania M Rodriguez², Maile B Thayer², Vanessa Rivera-Amill³, Gabriela Paz-Bailey², Laura E Adams², Joshua M Wong²

Affiliations

¹ Division of Vector-Borne Diseases, Centers for Disease Control and Prevention, San Juan, Puerto Rico, USA. ock0@cdc.gov.
² Division of Vector-Borne Diseases, Centers for Disease Control and Prevention, San Juan, Puerto Rico, USA.
³ Ponce Health Sciences University/Ponce Research Institute, Ponce, Puerto Rico, USA.

PMID: 39905498
PMCID: PMC11796212
DOI: 10.1186/s40249-025-01273-0

Machine learning for predicting severe dengue in Puerto Rico

Zachary J Madewell et al. Infect Dis Poverty. 2025.

. 2025 Feb 4;14(1):5.

doi: 10.1186/s40249-025-01273-0.

Authors

Zachary J Madewell¹, Dania M Rodriguez², Maile B Thayer², Vanessa Rivera-Amill³, Gabriela Paz-Bailey², Laura E Adams², Joshua M Wong²

Affiliations

¹ Division of Vector-Borne Diseases, Centers for Disease Control and Prevention, San Juan, Puerto Rico, USA. ock0@cdc.gov.
² Division of Vector-Borne Diseases, Centers for Disease Control and Prevention, San Juan, Puerto Rico, USA.
³ Ponce Health Sciences University/Ponce Research Institute, Ponce, Puerto Rico, USA.

PMID: 39905498
PMCID: PMC11796212
DOI: 10.1186/s40249-025-01273-0

Abstract

Background: Distinguishing between non-severe and severe dengue is crucial for timely intervention and reducing morbidity and mortality. World Health Organization (WHO)-recommended warning signs offer a practical approach for clinicians but have limited sensitivity and specificity. This study aims to evaluate machine learning (ML) model performance compared to WHO-recommended warning signs in predicting severe dengue among laboratory-confirmed cases in Puerto Rico.

Methods: We analyzed data from Puerto Rico's Sentinel Enhanced Dengue Surveillance System (May 2012-August 2024), using 40 clinical, demographic, and laboratory variables. Nine ML models, including Decision Trees, K-Nearest Neighbors, Naïve Bayes, Support Vector Machines, Artificial Neural Networks, AdaBoost, CatBoost, LightGBM, and XGBoost, were trained using fivefold cross-validation and evaluated with area under the receiver operating characteristic curve (AUC-ROC), sensitivity, and specificity. A subanalysis excluded hemoconcentration and leukopenia to assess performance in resource-limited settings. An AUC-ROC value of 0.5 indicates no discriminative power, while values closer to 1.0 reflect better performance.

Results: Among the 1708 laboratory-confirmed dengue cases, 24.3% were classified as severe. Gradient boosting algorithms achieved the highest predictive performance, with an AUC-ROC of 97.1% (95% CI: 96.0-98.3%) for CatBoost using the full 40-variable feature set. Feature importance analysis identified hemoconcentration (≥ 20% increase during illness or ≥ 20% above baseline for age and sex), leukopenia (white blood cell count < 4000/mm³), and timing of presentation at 4-6 days post-symptom onset as key predictors. When excluding hemoconcentration and leukopenia, the CatBoost AUC-ROC was 96.7% (95% CI: 95.5-98.0%), demonstrating minimal reduction in performance. Individual warning signs like abdominal pain and restlessness had sensitivities of 79.0% and 64.6%, but lower specificities of 48.4% and 59.1%, respectively. Combining ≥ 3 warning signs improved specificity (80.9%) while maintaining moderate sensitivity (78.6%), resulting in an AUC-ROC of 74.0%.

Conclusions: ML models, especially gradient boosting algorithms, outperformed traditional warning signs in predicting severe dengue. Integrating these models into clinical decision-support tools could help clinicians better identify high-risk patients, guiding timely interventions like hospitalization, closer monitoring, or the administration of intravenous fluids. The subanalysis excluding hemoconcentration confirmed the models' applicability in resource-limited settings, where access to laboratory data may be limited.

Keywords: Caribbean; Clinical decision support; Dengue; Ensemble learning; Feature importance; Gradient boosting.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: The Institutional Review Boards at the Centers for Disease Control and Prevention (CDC), Auxilio Mutuo, and Ponce Medical School Foundation approved the SEDSS study protocols 6214, and 120308-VR/2311173707, respectively. Written consent to participate was obtained from all adult participants and emancipated minors. For minors aged 14 to 20 years, written consent was obtained, and for those aged 7 to 13 years, parental written consent and participant assent were obtained. Consent for publication: Not applicable. Competing interests: The authors declare no conflict of interests.

Figures

**Fig. 1**
Euler plot of proportion of severe dengue cases with each warning sign, sentinel enhanced dengue surveillance system, Puerto Rico, 2012–2024

**Fig. 2**
Pearson’s correlation of predictions between machine learning models, Sentinel Enhanced Dengue Surveillance System, Puerto Rico, 2012–2024. Pearson correlation coefficients measure the linear agreement between the predictions of different machine learning models. Higher values indicate similar prediction patterns across models, suggesting that models are identifying similar cases as severe dengue. Darker colors represent higher correlations

**Fig. 3**
Forest plot of AUC values for Decision Trees (DT), K-Nearest Neighbors (KNN), Naïve Bayes, Support Vector Machines (SVM), Artificial Neural Networks (ANN), Adaptive Boosting (AdaBoost), Categorical Boosting (CatBoost), Light Gradient Boosting Machine (LightGBM), eXtreme Gradient Boosting (XGBoost), and ensemble models for a 40-variable feature set and subsets excluding CBCs, IgG, and serotype results, Sentinel Enhanced Dengue Surveillance System, Puerto Rico, 2012–2024. DeLong method was used to obtain the 95% confidence intervals for the AUC-ROC. *CBC* Complete blood count, *IgG* Immunoglobulin G, *AUC-ROC* Area under the receiver operating characteristic curve.

**Fig. 4**
SHapley Additive exPlanations (SHAP) values for the 40 Features in CatBoost, Sentinel Enhanced Dengue Surveillance System, Puerto Rico, 2012–2024. SHAP values measure each feature’s contribution to the prediction of severe dengue in the CatBoost model. Positive SHAP values indicate a higher likelihood of severe dengue, while negative values suggest a lower likelihood (or protective effect). Each dot represents a single case, with its horizontal position showing the SHAP value, reflecting the strength and direction of the feature’s impact. The color of the dots indicates the actual feature value for each case. For most features, values are binary (0 or 1), representing presence or absence (e.g., rash or no rash). For age group, the scale ranges from 0 to 7, with 0 indicating the youngest age group (< 1 year) and 7 indicating the oldest age group (≥ 50 years). An example interpretation: if' ‘persistent vomiting’ has a positive SHAP value and the dot is green (value = 1), it indicates that the presence of persistent vomiting strongly increases the likelihood of severe dengue for that case. The mean SHAP values shown on the right represent the average absolute impact of each feature across all cases, indicating the overall importance of that feature in the model’s predictions

**Fig. 5**
Iterative improvement in area under the curve (AUC) with additional variables in CatBoost model for severe dengue prediction, Sentinel Enhanced Dengue Surveillance System, Puerto Rico, 2012–2024. This figure shows the change in AUC as top-performing variables are sequentially added to the CatBoost model. Starting with the highest-impact feature, “Days post onset,” each subsequent model includes one additional variable in the order of their mean SHAP values. The combinations of variables and their AUC, along with 95% confidence intervals, are shown to demonstrate the predictive gain with each added variable

See this image and copyright information in PMC

Cited by

Predicting and explaining high dead-on-arrival outcomes in meat-type ducks using deep learning: A path to improved welfare management.
Jainonthee C, Sanwisate P, Sivapirunthep P, Chaosap C, Pichpol D, Mektrirat R, Chadsuthi S, Punyapornwithaya V. Jainonthee C, et al. Poult Sci. 2025 Jun 13;104(9):105439. doi: 10.1016/j.psj.2025.105439. Online ahead of print. Poult Sci. 2025. PMID: 40541105 Free PMC article.

References

1. Bhatt S, Gething PW, Brady OJ, Messina JP, Farlow AW, Moyes CL, et al. The global distribution and burden of dengue. Nature. 2013;496(7446):504–7. - PMC - PubMed
1. Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018; 392 (10159): 1736–88. - PMC - PubMed
1. Madewell ZJ. Arboviruses and their vectors. South Med J. 2020;113(10):520–3. - PMC - PubMed
1. Rodriguez DM, Madewell ZJ, Torres JM, Rivera A, Wong JM, Santiago GA, et al. Epidemiology of dengue - Puerto Rico, 2010–2024. MMWR Morb Mortal Wkly Rep. 2024;73(49):1112–7. - PMC - PubMed
1. Thayer MB, Marzan-Rodriguez M, Torres Aponte J, Rivera A, Rodriguez DM, Madewell ZJ, et al. Dengue epidemic alert thresholds: A tool for surveillance and epidemic detection. Medrxiv. 2024;11:9.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- BioMed Central
- PubMed Central
Medical
- MedlinePlus Health Information

[1] Bhatt S, Gething PW, Brady OJ, Messina JP, Farlow AW, Moyes CL, et al. The global distribution and burden of dengue. Nature. 2013;496(7446):504–7. - PMC - PubMed

[2] Bhatt S, Gething PW, Brady OJ, Messina JP, Farlow AW, Moyes CL, et al. The global distribution and burden of dengue. Nature. 2013;496(7446):504–7. - PMC - PubMed

[3] Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018; 392 (10159): 1736–88. - PMC - PubMed

[4] Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018; 392 (10159): 1736–88. - PMC - PubMed

[5] Madewell ZJ. Arboviruses and their vectors. South Med J. 2020;113(10):520–3. - PMC - PubMed

[6] Madewell ZJ. Arboviruses and their vectors. South Med J. 2020;113(10):520–3. - PMC - PubMed

[7] Rodriguez DM, Madewell ZJ, Torres JM, Rivera A, Wong JM, Santiago GA, et al. Epidemiology of dengue - Puerto Rico, 2010–2024. MMWR Morb Mortal Wkly Rep. 2024;73(49):1112–7. - PMC - PubMed

[8] Rodriguez DM, Madewell ZJ, Torres JM, Rivera A, Wong JM, Santiago GA, et al. Epidemiology of dengue - Puerto Rico, 2010–2024. MMWR Morb Mortal Wkly Rep. 2024;73(49):1112–7. - PMC - PubMed

[9] Thayer MB, Marzan-Rodriguez M, Torres Aponte J, Rivera A, Rodriguez DM, Madewell ZJ, et al. Dengue epidemic alert thresholds: A tool for surveillance and epidemic detection. Medrxiv. 2024;11:9.

[10] Thayer MB, Marzan-Rodriguez M, Torres Aponte J, Rivera A, Rodriguez DM, Madewell ZJ, et al. Dengue epidemic alert thresholds: A tool for surveillance and epidemic detection. Medrxiv. 2024;11:9.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine learning for predicting severe dengue in Puerto Rico

Affiliations

Machine learning for predicting severe dengue in Puerto Rico

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Medical