Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb 4;14(1):5.
doi: 10.1186/s40249-025-01273-0.

Machine learning for predicting severe dengue in Puerto Rico

Affiliations

Machine learning for predicting severe dengue in Puerto Rico

Zachary J Madewell et al. Infect Dis Poverty. .

Abstract

Background: Distinguishing between non-severe and severe dengue is crucial for timely intervention and reducing morbidity and mortality. World Health Organization (WHO)-recommended warning signs offer a practical approach for clinicians but have limited sensitivity and specificity. This study aims to evaluate machine learning (ML) model performance compared to WHO-recommended warning signs in predicting severe dengue among laboratory-confirmed cases in Puerto Rico.

Methods: We analyzed data from Puerto Rico's Sentinel Enhanced Dengue Surveillance System (May 2012-August 2024), using 40 clinical, demographic, and laboratory variables. Nine ML models, including Decision Trees, K-Nearest Neighbors, Naïve Bayes, Support Vector Machines, Artificial Neural Networks, AdaBoost, CatBoost, LightGBM, and XGBoost, were trained using fivefold cross-validation and evaluated with area under the receiver operating characteristic curve (AUC-ROC), sensitivity, and specificity. A subanalysis excluded hemoconcentration and leukopenia to assess performance in resource-limited settings. An AUC-ROC value of 0.5 indicates no discriminative power, while values closer to 1.0 reflect better performance.

Results: Among the 1708 laboratory-confirmed dengue cases, 24.3% were classified as severe. Gradient boosting algorithms achieved the highest predictive performance, with an AUC-ROC of 97.1% (95% CI: 96.0-98.3%) for CatBoost using the full 40-variable feature set. Feature importance analysis identified hemoconcentration (≥ 20% increase during illness or ≥ 20% above baseline for age and sex), leukopenia (white blood cell count < 4000/mm3), and timing of presentation at 4-6 days post-symptom onset as key predictors. When excluding hemoconcentration and leukopenia, the CatBoost AUC-ROC was 96.7% (95% CI: 95.5-98.0%), demonstrating minimal reduction in performance. Individual warning signs like abdominal pain and restlessness had sensitivities of 79.0% and 64.6%, but lower specificities of 48.4% and 59.1%, respectively. Combining ≥ 3 warning signs improved specificity (80.9%) while maintaining moderate sensitivity (78.6%), resulting in an AUC-ROC of 74.0%.

Conclusions: ML models, especially gradient boosting algorithms, outperformed traditional warning signs in predicting severe dengue. Integrating these models into clinical decision-support tools could help clinicians better identify high-risk patients, guiding timely interventions like hospitalization, closer monitoring, or the administration of intravenous fluids. The subanalysis excluding hemoconcentration confirmed the models' applicability in resource-limited settings, where access to laboratory data may be limited.

Keywords: Caribbean; Clinical decision support; Dengue; Ensemble learning; Feature importance; Gradient boosting.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: The Institutional Review Boards at the Centers for Disease Control and Prevention (CDC), Auxilio Mutuo, and Ponce Medical School Foundation approved the SEDSS study protocols 6214, and 120308-VR/2311173707, respectively. Written consent to participate was obtained from all adult participants and emancipated minors. For minors aged 14 to 20 years, written consent was obtained, and for those aged 7 to 13 years, parental written consent and participant assent were obtained. Consent for publication: Not applicable. Competing interests: The authors declare no conflict of interests.

Figures

Fig. 1
Fig. 1
Euler plot of proportion of severe dengue cases with each warning sign, sentinel enhanced dengue surveillance system, Puerto Rico, 2012–2024
Fig. 2
Fig. 2
Pearson’s correlation of predictions between machine learning models, Sentinel Enhanced Dengue Surveillance System, Puerto Rico, 2012–2024. Pearson correlation coefficients measure the linear agreement between the predictions of different machine learning models. Higher values indicate similar prediction patterns across models, suggesting that models are identifying similar cases as severe dengue. Darker colors represent higher correlations
Fig. 3
Fig. 3
Forest plot of AUC values for Decision Trees (DT), K-Nearest Neighbors (KNN), Naïve Bayes, Support Vector Machines (SVM), Artificial Neural Networks (ANN), Adaptive Boosting (AdaBoost), Categorical Boosting (CatBoost), Light Gradient Boosting Machine (LightGBM), eXtreme Gradient Boosting (XGBoost), and ensemble models for a 40-variable feature set and subsets excluding CBCs, IgG, and serotype results, Sentinel Enhanced Dengue Surveillance System, Puerto Rico, 2012–2024. DeLong method was used to obtain the 95% confidence intervals for the AUC-ROC. CBC Complete blood count, IgG Immunoglobulin G, AUC-ROC Area under the receiver operating characteristic curve.
Fig. 4
Fig. 4
SHapley Additive exPlanations (SHAP) values for the 40 Features in CatBoost, Sentinel Enhanced Dengue Surveillance System, Puerto Rico, 2012–2024. SHAP values measure each feature’s contribution to the prediction of severe dengue in the CatBoost model. Positive SHAP values indicate a higher likelihood of severe dengue, while negative values suggest a lower likelihood (or protective effect). Each dot represents a single case, with its horizontal position showing the SHAP value, reflecting the strength and direction of the feature’s impact. The color of the dots indicates the actual feature value for each case. For most features, values are binary (0 or 1), representing presence or absence (e.g., rash or no rash). For age group, the scale ranges from 0 to 7, with 0 indicating the youngest age group (< 1 year) and 7 indicating the oldest age group (≥ 50 years). An example interpretation: if' ‘persistent vomiting’ has a positive SHAP value and the dot is green (value = 1), it indicates that the presence of persistent vomiting strongly increases the likelihood of severe dengue for that case. The mean SHAP values shown on the right represent the average absolute impact of each feature across all cases, indicating the overall importance of that feature in the model’s predictions
Fig. 5
Fig. 5
Iterative improvement in area under the curve (AUC) with additional variables in CatBoost model for severe dengue prediction, Sentinel Enhanced Dengue Surveillance System, Puerto Rico, 2012–2024. This figure shows the change in AUC as top-performing variables are sequentially added to the CatBoost model. Starting with the highest-impact feature, “Days post onset,” each subsequent model includes one additional variable in the order of their mean SHAP values. The combinations of variables and their AUC, along with 95% confidence intervals, are shown to demonstrate the predictive gain with each added variable

Similar articles

Cited by

References

    1. Bhatt S, Gething PW, Brady OJ, Messina JP, Farlow AW, Moyes CL, et al. The global distribution and burden of dengue. Nature. 2013;496(7446):504–7. - PMC - PubMed
    1. Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018; 392 (10159): 1736–88. - PMC - PubMed
    1. Madewell ZJ. Arboviruses and their vectors. South Med J. 2020;113(10):520–3. - PMC - PubMed
    1. Rodriguez DM, Madewell ZJ, Torres JM, Rivera A, Wong JM, Santiago GA, et al. Epidemiology of dengue - Puerto Rico, 2010–2024. MMWR Morb Mortal Wkly Rep. 2024;73(49):1112–7. - PMC - PubMed
    1. Thayer MB, Marzan-Rodriguez M, Torres Aponte J, Rivera A, Rodriguez DM, Madewell ZJ, et al. Dengue epidemic alert thresholds: A tool for surveillance and epidemic detection. Medrxiv. 2024;11:9.

LinkOut - more resources