Data-driven machine learning algorithm model for pneumonia prediction and determinant factor stratification among children aged 6-23 months in Ethiopia
- PMID: 40316929
- PMCID: PMC12048943
- DOI: 10.1186/s12879-025-10916-4
Data-driven machine learning algorithm model for pneumonia prediction and determinant factor stratification among children aged 6-23 months in Ethiopia
Erratum in
-
Correction: Data-driven machine learning algorithm model for pneumonia prediction and determinant factor stratification among children aged 6-23 months in Ethiopia.BMC Infect Dis. 2025 Jun 26;25(1):808. doi: 10.1186/s12879-025-11137-5. BMC Infect Dis. 2025. PMID: 40571921 Free PMC article. No abstract available.
Abstract
Introduction: Pneumonia is the leading cause of child morbidity and mortality and accounts for 5.6 million under-five child deaths. Pneumonia has a significant impact on the quality of life, the country's economy, and the survival of children. Therefore, this study aimed to develop data-driven predictive model using machine learning algorithms to predict pneumonia and stratify the determinant factors among children aged 6-23 months in Ethiopia.
Methods: A total of 2035 samples of children were used from the 2016 Ethiopian Demographic and Health Survey dataset. Jupyter Notebook from Anaconda Navigators was used for data management and analysis. Important libraries such as Pandas, Seaborn, and Numpy were imported from Python. The data was pre-processed into a training and testing dataset with a 4:1 ratio, and tenfold cross-validation was used to reduce bias and enhance the models' performance. Six machine learning algorithms were used for model building and comparison, and confusion matrix elements were used to evaluate the performance of each algorithm. Principal component analysis and heatmap function were used for correlation detection between features. Feature importance score was used to identify and stratify the most important predictors of pneumonia.
Results: From 2035 total samples, 16.6%, 20.1%, and 24.2% of children had short rapid breath, fever, and cough respectively. The overall magnitude of pneumonia among children aged 6-23 months was 31.3% based on the 2016 EDHS report. A random forest algorithm is the relatively best performance model to predict pneumonia and stratify its determinates with 91.3% accuracy. The health facility visits, child sex, initiation of breastfeeding, birth interval, birth weight, husbands' education, women's age, and region, are the top eight important predictors of pneumonia among children with important scores of more than 5% to 20% respectively.
Conclusions: Random forest is the best model to predict pneumonia and stratify its determinant factors. The implications of this study are profound for advanced research methodology, tailored to promote effective health interventions such as lifestyle modification and behavioral intervention, based on individuals' unique features, specifically for stakeholders to take proactive childcare interventions. The study would serve as pioneering evidence for future research, and researchers are recommended to use deep learning algorithms to enhance prediction accuracy.
Keywords: Children; Data-driven; Machine learning; Pneumonia; Prediction Model.
© 2025. The Author(s).
Conflict of interest statement
Declarations. Ethics approval and consent to participate: Ethical approval and consent from study participants were not necessary for this study. This is because this study was based on a secondary data source that is publicly available from the Measure DHS program website ( https://dhsprogram.com/Date/terms-of-use.cfm ). Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.
Figures








Similar articles
-
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320. Health Technol Assess. 2001. PMID: 12065068
-
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3. Cochrane Database Syst Rev. 2022. PMID: 35593186 Free PMC article.
-
Incentives for preventing smoking in children and adolescents.Cochrane Database Syst Rev. 2017 Jun 6;6(6):CD008645. doi: 10.1002/14651858.CD008645.pub3. Cochrane Database Syst Rev. 2017. PMID: 28585288 Free PMC article.
-
Application of machine learning algorithms to model predictors of informed contraceptive choice among reproductive age women in six high fertility rate sub Sahara Africa countries.BMC Public Health. 2025 May 29;25(1):1986. doi: 10.1186/s12889-025-23242-w. BMC Public Health. 2025. PMID: 40442626 Free PMC article.
-
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340. Health Technol Assess. 2006. PMID: 16959170
Cited by
-
Correction: Data-driven machine learning algorithm model for pneumonia prediction and determinant factor stratification among children aged 6-23 months in Ethiopia.BMC Infect Dis. 2025 Jun 26;25(1):808. doi: 10.1186/s12879-025-11137-5. BMC Infect Dis. 2025. PMID: 40571921 Free PMC article. No abstract available.
References
-
- Pneumonia, symptom, and treatment: Accessed from https://www.who.int/health-topics/pneumonia/#tab=tab_1.
-
- Lema K, et al. Prevalence and associated factors of pneumonia among under-five children at public hospitals in Jimma zone, South West of Ethiopia, 2018. J Pulmonol Clin Res. 2018;2(1):25–31.
MeSH terms
LinkOut - more resources
Full Text Sources
Medical