Predicting determinants of unimproved water supply in Ethiopia using machine learning analysis of EDHS-2019 data
- PMID: 40185896
- PMCID: PMC11971422
- DOI: 10.1038/s41598-025-96412-w
Predicting determinants of unimproved water supply in Ethiopia using machine learning analysis of EDHS-2019 data
Abstract
Over 2 billion people worldwide are impacted by the global dilemma of access to clean and safe drinking water. The problem is most acute in low-income nations, where many people still use unimproved water sources such as exposed wells and surface water. Public health systems are heavily burdened by these sources since they are closely associated with the spread of waterborne illnesses. As a result, there are still many people who suffer from water-related health problems, especially in underdeveloped nations where access to healthcare is limited and sanitation is often inadequate. However, the conventional analytical techniques employed in these investigations frequently fall short of capturing the intricate relationships among many variables, which could restrict the capacity to forecast future patterns. This study aimed to provide more accurate predictions and data-driven insights that can inform policy-making, resource allocation, and interventions to address Ethiopia's water crisis. The Ethiopia Demographic and Health Survey (EDHS-2019), which offers thorough data on socioeconomic, demographic, and water access determinants, was the data source for this study. The following six machine-learning models were used: k-nearest Neighbors, Random Forest, Support Vector Machines, Gradient Boosting Machines, and Artificial Neural Networks. To enhance model performance and prevent overfitting, Hyperparameter adjustment was accomplished via random search and 7-fold cross-validation. The model's performance was evaluated using the standard classification metrics (accuracy, precision, recall, F1-score, and AUC). To examine the significance of features in tree-based models, permutation importance and SHAP values were utilized. In important measures such as AUC (0.8915), F1 Score (0.919), sensitivity (0.879), and specificity (0.967), the Random Forest model fared better than the other models. "Community-level poverty" was the most important predictor, followed by "household wealth index" and "age of household head," according to feature importance analysis. Geographic differences in access to better water sources were found through spatial analysis, with rural areas being the most impacted. Using machine-learning algorithms, specifically Random Forest, has yielded significant insights into the factors influencing Ethiopia's unimproved water supply. The results highlight the necessity of focused interventions in areas with high rates of poverty and insufficient infrastructure. These data-driven insights can help decision-makers better solve Ethiopia's water crisis.
Keywords: EDHS-2019; Feature importance; Machine learning; Random forest; Spatial analysis; Unimproved water supply.
© 2025. The Author(s).
Conflict of interest statement
Declarations. Competing interests: The authors declare no competing interests.
Figures



Similar articles
-
Unimproved source of drinking water and its associated factors: a spatial and multilevel analysis of Ethiopian demographic and health survey.BMC Public Health. 2023 Jul 31;23(1):1455. doi: 10.1186/s12889-023-16354-8. BMC Public Health. 2023. PMID: 37525187 Free PMC article.
-
Hotspots of unimproved sources of drinking water in Ethiopia: mapping and spatial analysis of Ethiopia demographic and health survey Data 2016.BMC Public Health. 2020 Jun 8;20(1):878. doi: 10.1186/s12889-020-08957-2. BMC Public Health. 2020. PMID: 32513128 Free PMC article.
-
Geospatial distribution of unimproved water source and sanitation facilities in Ethiopia: evidence from the latest demographic and health survey (2019).Sci Rep. 2025 Jan 2;15(1):255. doi: 10.1038/s41598-024-82688-x. Sci Rep. 2025. PMID: 39747913 Free PMC article.
-
The 2023 Latin America report of the Lancet Countdown on health and climate change: the imperative for health-centred climate-resilient development.Lancet Reg Health Am. 2024 Apr 23;33:100746. doi: 10.1016/j.lana.2024.100746. eCollection 2024 May. Lancet Reg Health Am. 2024. PMID: 38800647 Free PMC article. Review.
-
Machine Learning for Predicting Postoperative Atrial Fibrillation After Cardiac Surgery: A Scoping Review of Current Literature.Am J Cardiol. 2023 Dec 15;209:66-75. doi: 10.1016/j.amjcard.2023.09.079. Epub 2023 Oct 21. Am J Cardiol. 2023. PMID: 37871512
References
-
- Pichel, N., Vivar, M. & Fuentes, M. The problem of drinking water access: A review of disinfection technologies with an emphasis on solar treatment methods. Chemosphere218, 1014–1030 (2019). - PubMed
-
- Jayawardena, A. An inconvenient truth about access to safe drinking water. Int. J. Environ. Clim. Change11(10), 158–168 (2021).
-
- Daly, S. W. et al. Multiple water source use in low-and middle-income countries: A systematic review. J. Water Health19(3), 370–392 (2021). - PubMed
-
- Genter, F., Willetts, J. & Foster, T. Faecal contamination of groundwater self-supply in low-and middle income countries: Systematic review and meta-analysis. Water Res.201, 117350 (2021). - PubMed
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Medical
Research Materials