Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 4;15(1):11561.
doi: 10.1038/s41598-025-96412-w.

Predicting determinants of unimproved water supply in Ethiopia using machine learning analysis of EDHS-2019 data

Affiliations

Predicting determinants of unimproved water supply in Ethiopia using machine learning analysis of EDHS-2019 data

Jember Azanaw et al. Sci Rep. .

Abstract

Over 2 billion people worldwide are impacted by the global dilemma of access to clean and safe drinking water. The problem is most acute in low-income nations, where many people still use unimproved water sources such as exposed wells and surface water. Public health systems are heavily burdened by these sources since they are closely associated with the spread of waterborne illnesses. As a result, there are still many people who suffer from water-related health problems, especially in underdeveloped nations where access to healthcare is limited and sanitation is often inadequate. However, the conventional analytical techniques employed in these investigations frequently fall short of capturing the intricate relationships among many variables, which could restrict the capacity to forecast future patterns. This study aimed to provide more accurate predictions and data-driven insights that can inform policy-making, resource allocation, and interventions to address Ethiopia's water crisis. The Ethiopia Demographic and Health Survey (EDHS-2019), which offers thorough data on socioeconomic, demographic, and water access determinants, was the data source for this study. The following six machine-learning models were used: k-nearest Neighbors, Random Forest, Support Vector Machines, Gradient Boosting Machines, and Artificial Neural Networks. To enhance model performance and prevent overfitting, Hyperparameter adjustment was accomplished via random search and 7-fold cross-validation. The model's performance was evaluated using the standard classification metrics (accuracy, precision, recall, F1-score, and AUC). To examine the significance of features in tree-based models, permutation importance and SHAP values were utilized. In important measures such as AUC (0.8915), F1 Score (0.919), sensitivity (0.879), and specificity (0.967), the Random Forest model fared better than the other models. "Community-level poverty" was the most important predictor, followed by "household wealth index" and "age of household head," according to feature importance analysis. Geographic differences in access to better water sources were found through spatial analysis, with rural areas being the most impacted. Using machine-learning algorithms, specifically Random Forest, has yielded significant insights into the factors influencing Ethiopia's unimproved water supply. The results highlight the necessity of focused interventions in areas with high rates of poverty and insufficient infrastructure. These data-driven insights can help decision-makers better solve Ethiopia's water crisis.

Keywords: EDHS-2019; Feature importance; Machine learning; Random forest; Spatial analysis; Unimproved water supply.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
AUC scores for the six machine learning algorithms.
Fig. 2
Fig. 2
Feature Importance derived from the Random Forest algorithm. Where: CLP = Community-Level Poverty, CLE = Community-Level Education, HLE = Highest Education Level, CME = Community Media Exposure, AHHH = Age of Household Head, and HHWI = Household Wealth Index.
Fig. 3
Fig. 3
Geographical difference in unimproved water sources in Ethiopia based on EDHS-2019.

Similar articles

References

    1. Pichel, N., Vivar, M. & Fuentes, M. The problem of drinking water access: A review of disinfection technologies with an emphasis on solar treatment methods. Chemosphere218, 1014–1030 (2019). - PubMed
    1. Jayawardena, A. An inconvenient truth about access to safe drinking water. Int. J. Environ. Clim. Change11(10), 158–168 (2021).
    1. Daly, S. W. et al. Multiple water source use in low-and middle-income countries: A systematic review. J. Water Health19(3), 370–392 (2021). - PubMed
    1. Genter, F., Willetts, J. & Foster, T. Faecal contamination of groundwater self-supply in low-and middle income countries: Systematic review and meta-analysis. Water Res.201, 117350 (2021). - PubMed
    1. Ahmed, J. et al. Quantitative microbial risk assessment of drinking water quality to predict the risk of waterborne diseases in primary-school children. Int. J. Environ. Res. Public Health17(8), 2774 (2020). - PMC - PubMed