Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 2;25(1):647.
doi: 10.1186/s12879-025-10916-4.

Data-driven machine learning algorithm model for pneumonia prediction and determinant factor stratification among children aged 6-23 months in Ethiopia

Affiliations

Data-driven machine learning algorithm model for pneumonia prediction and determinant factor stratification among children aged 6-23 months in Ethiopia

Addisalem Workie Demsash et al. BMC Infect Dis. .

Erratum in

Abstract

Introduction: Pneumonia is the leading cause of child morbidity and mortality and accounts for 5.6 million under-five child deaths. Pneumonia has a significant impact on the quality of life, the country's economy, and the survival of children. Therefore, this study aimed to develop data-driven predictive model using machine learning algorithms to predict pneumonia and stratify the determinant factors among children aged 6-23 months in Ethiopia.

Methods: A total of 2035 samples of children were used from the 2016 Ethiopian Demographic and Health Survey dataset. Jupyter Notebook from Anaconda Navigators was used for data management and analysis. Important libraries such as Pandas, Seaborn, and Numpy were imported from Python. The data was pre-processed into a training and testing dataset with a 4:1 ratio, and tenfold cross-validation was used to reduce bias and enhance the models' performance. Six machine learning algorithms were used for model building and comparison, and confusion matrix elements were used to evaluate the performance of each algorithm. Principal component analysis and heatmap function were used for correlation detection between features. Feature importance score was used to identify and stratify the most important predictors of pneumonia.

Results: From 2035 total samples, 16.6%, 20.1%, and 24.2% of children had short rapid breath, fever, and cough respectively. The overall magnitude of pneumonia among children aged 6-23 months was 31.3% based on the 2016 EDHS report. A random forest algorithm is the relatively best performance model to predict pneumonia and stratify its determinates with 91.3% accuracy. The health facility visits, child sex, initiation of breastfeeding, birth interval, birth weight, husbands' education, women's age, and region, are the top eight important predictors of pneumonia among children with important scores of more than 5% to 20% respectively.

Conclusions: Random forest is the best model to predict pneumonia and stratify its determinant factors. The implications of this study are profound for advanced research methodology, tailored to promote effective health interventions such as lifestyle modification and behavioral intervention, based on individuals' unique features, specifically for stakeholders to take proactive childcare interventions. The study would serve as pioneering evidence for future research, and researchers are recommended to use deep learning algorithms to enhance prediction accuracy.

Keywords: Children; Data-driven; Machine learning; Pneumonia; Prediction Model.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Ethical approval and consent from study participants were not necessary for this study. This is because this study was based on a secondary data source that is publicly available from the Measure DHS program website ( https://dhsprogram.com/Date/terms-of-use.cfm ). Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
The workflow of the entire machine methodology in this study
Fig. 2
Fig. 2
Magnitude of pneumonia, fever, cough, and short/rapid breath among children aged 6–23 months in Ethiopia using the 2016 EDHS dataset
Fig. 3
Fig. 3
Correlation detection within the dataset; Where BF = breast feeding, BI = Birth Interval, ANC = Antenatal care, BW = Birth weight
Fig. 4
Fig. 4
The principal component analysis before and after dimensionality reduction
Fig. 5
Fig. 5
The data imbalance detection and management using SMOTE
Fig. 6
Fig. 6
Data-driven model comparison of included machine learning algorithms
Fig. 7
Fig. 7
Important feature stratification based on random forest algorithms
Fig. 8
Fig. 8
The actual and predictive values of pneumonia among children aged 6–23 months in Ethiopia, using the 2019 EDHS dataset

Similar articles

Cited by

References

    1. Manohar P, et al. Secondary bacterial infections in patients with viral pneumonia. Front Med. 2020;7: 420. - PMC - PubMed
    1. Pneumonia, symptom, and treatment: Accessed from https://www.who.int/health-topics/pneumonia/#tab=tab_1.
    1. Lema K, et al. Prevalence and associated factors of pneumonia among under-five children at public hospitals in Jimma zone, South West of Ethiopia, 2018. J Pulmonol Clin Res. 2018;2(1):25–31.
    1. Nasrin S, et al. Factors associated with community acquired severe pneumonia among under five children in Dhaka, Bangladesh: A case control analysis. PLoS ONE. 2022;17(3): e0265871. - PMC - PubMed
    1. Beletew B, et al. Prevalence of pneumonia and its associated factors among under-five children in East Africa: a systematic review and meta-analysis. BMC Pediatr. 2020;20:1–13. - PMC - PubMed

LinkOut - more resources