Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 24;15(1):20.
doi: 10.1186/s13040-022-00308-8.

Machine Learning Algorithms for understanding the determinants of under-five Mortality

Affiliations

Machine Learning Algorithms for understanding the determinants of under-five Mortality

Rakesh Kumar Saroj et al. BioData Min. .

Abstract

Background: Under-five mortality is a matter of serious concern for child health as well as the social development of any country. The paper aimed to find the accuracy of machine learning models in predicting under-five mortality and identify the most significant factors associated with under-five mortality.

Method: The data was taken from the National Family Health Survey (NFHS-IV) of Uttar Pradesh. First, we used multivariate logistic regression due to its capability for predicting the important factors, then we used machine learning techniques such as decision tree, random forest, Naïve Bayes, K- nearest neighbor (KNN), logistic regression, support vector machine (SVM), neural network, and ridge classifier. Each model's accuracy was checked by a confusion matrix, accuracy, precision, recall, F1 score, Cohen's Kappa, and area under the receiver operating characteristics curve (AUROC). Information gain rank was used to find the important factors for under-five mortality. Data analysis was performed using, STATA-16.0, Python 3.3, and IBM SPSS Statistics for Windows, Version 27.0 software.

Result: By applying the machine learning models, results showed that the neural network model was the best predictive model for under-five mortality when compared with other predictive models, with model accuracy of (95.29% to 95.96%), recall (71.51% to 81.03%), precision (36.64% to 51.83%), F1 score (50.46% to 62.68%), Cohen's Kappa value (0.48 to 0.60), AUROC range (93.51% to 96.22%) and precision-recall curve range (99.52% to 99.73%). The neural network was the most efficient model, but logistic regression also shows well for predicting under-five mortality with accuracy (94% to 95%)., AUROC range (93.4% to 94.8%), and precision-recall curve (99.5% to 99.6%). The number of living children, survival time, wealth index, child size at birth, birth in the last five years, the total number of children ever born, mother's education level, and birth order were identified as important factors influencing under-five mortality.

Conclusion: The neural network model was a better predictive model compared to other machine learning models in predicting under-five mortality, but logistic regression analysis also shows good results. These models may be helpful for the analysis of high-dimensional data for health research.

Keywords: Accuracy; Machine learning; Neural Network; Random Forest; Under-five mortality.

PubMed Disclaimer

Conflict of interest statement

The authors declared that they have no competing interests.

Figures

Fig. 1
Fig. 1
Under-five mortality of Uttar Pradesh comparison graph of state-wise from (NFHS-4)
Fig. 2
Fig. 2
Overview of the proposed framework of machine learning for under-five child mortality data
Fig. 3
Fig. 3
ROC curve for machine learning models in predicting under-five mortality with all factors (70/30 Ratio)
Fig. 4
Fig. 4
Precision-Recall curves for machine learning models in predicting under-five mortality with all factors (70/30 Ratio)\
Fig. 5
Fig. 5
ROC curve for machine learning models in predicting under-five mortality with all factors (80/20 Ratio)
Fig. 6
Fig. 6
Precision-Recall curve for machine learning models in predicting under-five mortality with all factors (80/20 Ratio)
Fig. 7
Fig. 7
Information gain rank values of the variables under study
Fig. 8
Fig. 8
ROC curve for machine learning models in predicting under-five mortality with important factors (70/30) Ratio
Fig. 9
Fig. 9
Precision-Recall curve for machine learning models in predicting under-five mortality with important factors (70/30 Ratio)
Fig. 10
Fig. 10
ROC curve for machine learning models in predicting under-five mortality with important factors (80/20 Ratio)
Fig. 11
Fig. 11
Precision-Recall curve for machine learning models in predicting under-five mortality with important factors (80/20 Ratio)

Similar articles

Cited by

References

    1. IIPS, ICF. National Family Health Survey (NFHS-4), 2015–16: India. Mumbai: International Institute for Population Sciences 2017.
    1. http://rchiips.org/nfhs/NFHS-4Reports/India.pdf (access on 23/07/2021 at 2.50 PM (IST)).
    1. Patel CJ. Analytic complexity and challenges in identifying mixtures of exposures associated with phenotypes in the exposome era. Current epidemiology reports. 2017;4(1):22–30. doi: 10.1007/s40471-017-0100-5. - DOI - PMC - PubMed
    1. Tesfaye B, Atique S, Elias N, Dibaba L, Shabbir SA, Kebede M. Determinants and development of a web-based child mortality prediction model in resource-limited settings: a data mining approach. Comput Methods Programs Biomed. 2017;140:45–51. doi: 10.1016/j.cmpb.2016.11.013. - DOI - PubMed
    1. Fenta HM, Zewotir T, Muluneh EK. A machine learning classifier approach for identifying the determinants of under-five child undernutrition in Ethiopian administrative zones. BMC Med Inform Decis Mak. 2021;21:291. doi: 10.1186/s12911-021-01652-1. - DOI - PMC - PubMed

LinkOut - more resources