Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 15;25(1):28.
doi: 10.1186/s12911-025-02855-6.

Derivation and validation of a clinical predictive model for longer duration diarrhea among pediatric patients in Kenya using machine learning algorithms

Affiliations

Derivation and validation of a clinical predictive model for longer duration diarrhea among pediatric patients in Kenya using machine learning algorithms

Billy Ogwel et al. BMC Med Inform Decis Mak. .

Abstract

Background: Despite the adverse health outcomes associated with longer duration diarrhea (LDD), there are currently no clinical decision tools for timely identification and better management of children with increased risk. This study utilizes machine learning (ML) to derive and validate a predictive model for LDD among children presenting with diarrhea to health facilities.

Methods: LDD was defined as a diarrhea episode lasting ≥ 7 days. We used 7 ML algorithms to build prognostic models for the prediction of LDD among children < 5 years using de-identified data from Vaccine Impact on Diarrhea in Africa study (N = 1,482) in model development and data from Enterics for Global Health Shigella study (N = 682) in temporal validation of the champion model. Features included demographic, medical history and clinical examination data collected at enrolment in both studies. We conducted split-sampling and employed K-fold cross-validation with over-sampling technique in the model development. Moreover, critical predictors of LDD and their impact on prediction were obtained using an explainable model agnostic approach. The champion model was determined based on the area under the curve (AUC) metric. Model calibrations were assessed using Brier, Spiegelhalter's z-test and its accompanying p-value.

Results: There was a significant difference in prevalence of LDD between the development and temporal validation cohorts (478 [32.3%] vs 69 [10.1%]; p < 0.001). The following variables were associated with LDD in decreasing order: pre-enrolment diarrhea days (55.1%), modified Vesikari score(18.2%), age group (10.7%), vomit days (8.8%), respiratory rate (6.5%), vomiting (6.4%), vomit frequency (6.2%), rotavirus vaccination (6.1%), skin pinch (2.4%) and stool frequency (2.4%). While all models showed good prediction capability, the random forest model achieved the best performance (AUC [95% Confidence Interval]: 83.0 [78.6-87.5] and 71.0 [62.5-79.4]) on the development and temporal validation datasets, respectively. While the random forest model showed slight deviations from perfect calibration, these deviations were not statistically significant (Brier score = 0.17, Spiegelhalter p-value = 0.219).

Conclusions: Our study suggests ML derived algorithms could be used to rapidly identify children at increased risk of LDD. Integrating ML derived models into clinical decision-making may allow clinicians to target these children with closer observation and enhanced management.

Keywords: Longer duration diarrhea; Machine Learning; Pediatric; Prediction.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: VIDA protocol was approved by the Institutional Review Board of the University of Maryland School of Medicine, Baltimore, MD, USA (UMB Protocol #: HM-HP-00062472) and the Kenya Medical Research Institute (KEMRI) Scientific and Ethical Review Unit (SERU) (SERU#2996). The EFGH protocol was approved by the KEMRI SERU (SERU#4362). Written informed consent was sought from caregivers in both studies before initiation of study procedures. Additionally, ethical approval for undertaking the current study was sought from the health research ethics committee of the University of South Africa, College of Agricultural Sciences (2023/CAES_HREC/2192). Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Model development and validation schematic diagram. *RF-Random Forest; GBM-Gradient Boosting; NB- Naïve Bayes; LR-Logistic Regression; SVM- Support vector machine; KNN-K-nearest neighbors; ANN-Artificial Neural Networks, VIDA- Vaccine Impact on Diarrhea in Africa Study, EFGH-Enterics for Global Health Shigella Study
Fig. 2
Fig. 2
Enrolment flow diagram of diarrhea cases in VIDA (2015–2018) and EFGH (2022–2023). MSD- Moderate-to-severe diarrhea. MAD- Medically attended diarrhea, LDD-Longer duration diarrhea, VIDA-Vaccine Impact on Diarrhea in Africa study, EFGH-Enterics for Global Health Shigella Study.β−Children enrolled and successfully followed up at week-4 or have surpassed the upper limit for week-4 follow-up (≥ 67 days post enrolment). ¥−Diarrhea duration obtained from Follow-up form where diarrhea diary was not returned but follow-up data was available
Fig. 3
Fig. 3
Feature selection for longer duration diarrhea among children aged < 5 years presenting with moderate to severe diarrhea in rural western Kenya, 2015–2023. Green, yellow, red and blue boxplots represent the Z scores of selected, tentative, rejected and shadow features, respectively. Selected and tentative features: Diarr_days; Vesikari; Agegroup; Vomit_days;breast_feed; resp_rate; Vomit; freq_vomit; Rotavirus vaccination; Rectal straining; Stool_count; Skin_turgor.The following additional features were rejected and are not included in the Figure: No. of children < 5 years in households; Total assets; Animal ownership; improved water; improved sanitation; shared facility; stool type; Blood in stool; drinks poorly; unable to drink; fever; restless; lethargy; unconscious; rectal prolapse; difficulty breathing; convulsion; sunken eyes; home zinc use; capillary refill; chest indrawing; sunken eyes; Bipedal edema; Abnormal hair; Dehydration; ORS at facility; Zinc at facility; IV rehydration; any_antibiotic; Malaria diagnosis; Dysentry diagnosis; Stunting; Wasting
Fig. 4
Fig. 4
Calibration plot of the random forest champion model
Fig. 5
Fig. 5
SHAP attributions for the Top 4 Longer Duration Diarrhea (LDD) models. * SVM- Support vector machine; ANN-Artificial Neural Networks. “ Diarr_days = 2”- Pre-enrolment diarrhea days = 2; “Rotavirus_vacc = 0”- No dose of rotavirus vaccine administered; “Vomit = 0”- No vomiting; “Vesikari_score = 2”- Moderate severity of diarrheal disease; “freq_vomit = 0”-maximum number of vomiting = 0; “Skin turgor = 0”- Normal skin turgor; “Stool_count”- ≥ 6 loose stools per day; “Vomit_days = 0”-0 vomiting days; “Agegroup = 2”- 12–23 months
Fig. 6
Fig. 6
SHAP attributions for the Top 4 Post-enrolment duration (≥ 7 days) models. “Vesikari_score = 2”- Moderate severity of diarrheal disease; “Diarr_days = 4”- Pre-enrolment diarrhea days = 4; “Vomit = 0”- No vomiting; “Vomit_days = 0”-0 vomiting days; “Skin turgor = 0”- Normal skin turgor; “Agegroup = 1”- 0–11 months; “Rotavirus_vacc = 1”- at least 1 dose of rotavirus vaccine administered; “Dry mouth = 1”- Somewhat dry mouth;
Fig. 7
Fig. 7
Business value plots for the Random Forest (RF) Model for Longer diarrhea duration (LDD)
Fig. 8
Fig. 8
Performance of champion model in development (2015–2018) and temporal validation (2022–2023) datasets. PPV- Positive Predictive Value; NPV- Negative Predictive Value; AUC- Area under the Curve; PRAUC- Precision Recall Area under the Curve

Similar articles

References

    1. World Health Organization. Diarrhoeal disease: Factsheet. 2024. Available at: https://www.who.int/news-room/fact-sheets/detail/diarrhoeal-disease. Accessed 17 July 2024.
    1. Giannattasio A, Guarino A, Lo Vecchio A. Management of children with prolonged diarrhea. F1000Research 2016; 5. Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4765715/. Accessed 25 November 2020. - PMC - PubMed
    1. Alam NH, Ashraf H. Treatment of Infectious Diarrhea in Children. Pediatr Drugs. 2003;5:151–65. - PubMed
    1. Keusch GT, Walker CF, Das JK, Horton S, Habte D. Diarrheal Diseases. In: Black RE, Laxminarayan R, Temmerman M, Walker N, eds. Reproductive, Maternal, Newborn, and Child Health: Disease Control Priorities, Third Edition (Volume 2). Washington (DC): The International Bank for Reconstruction and Development / The World Bank, 2016. Available at: http://www.ncbi.nlm.nih.gov/books/NBK361905/. Accessed 13 January 2023. - PubMed
    1. Strand TA, Sharma PR, Gjessing HK, et al. Risk Factors for Extended Duration of Acute Diarrhea in Young Children. PLoS ONE 2012; 7. Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3348155/. Accessed 27 November 2020. - PMC - PubMed

Publication types

LinkOut - more resources