Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 1;15(1):21913.
doi: 10.1038/s41598-025-08005-2.

Predicting car accident severity in Northwest Ethiopia: a machine learning approach leveraging driver, environmental, and road conditions

Affiliations

Predicting car accident severity in Northwest Ethiopia: a machine learning approach leveraging driver, environmental, and road conditions

Abraham Keffale Mengistu et al. Sci Rep. .

Abstract

Road traffic accidents (RTAs) in Northwest Ethiopia, a region with a fatality rate of 32.2 per 100,000 residents, pose a critical public health challenge exacerbated by infrastructural deficits and environmental hazards. This study leverages machine learning (ML) to predict accident severity, addressing gaps in localized predictive frameworks for low- and middle-income countries (LMICs). Our study aims to predict the severity of car accidents in Northwest Ethiopia via machine-learning techniques. Using a dataset of 2,000 accidents (2018-2023) from police reports, we integrated driver demographics, behavioral factors (e.g., alcohol use, seatbelt compliance), and environmental conditions (e.g., unpaved roads, weather) in North West Ethiopia. Ten ML models, including Random Forest, XGBoost, and LightGBM, were evaluated after addressing class imbalance via the Synthetic Minority Oversampling Technique (SMOTE). Hyperparameter tuning and Shapley Additive explanations (SHAP) provided model optimization and interpretability. Random Forest outperformed other models, achieving 82% accuracy (AUC-ROC: 0.87) post-tuning. Driver age (mean: 44 years) and environmental factors (e.g., nighttime on unlit roads, rainy conditions) were critical predictors, increasing fatal accident likelihood by 62%. SMOTE improved the accuracy of the outperforming random forest accuracy from 78.6 to 82%. Random Forest exhibited the highest recall (0.82) after optimization, while ensemble methods dominated performance metrics. The study underscores the efficacy of ML in contextualizing accident severity in LMICs, with Random Forest emerging as a robust tool for policymakers. Prioritizing road paving, sobriety checkpoints, and motorcycle safety could mitigate risks, aligning with Sustainable Development Goal 3.6. Future work should address data limitations (underreporting, geospatial gaps) and expand model interpretability.

Keywords: Car accident severity; Ethiopia; Machine learning; Road safety; SHAP; SMOTE.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests. Ethical approval: The study received ethical clearance from the IRB of Debre Markos University, CMHS (College of Medicine and Health Sciences), IRERC (Institutional Review and Ethics Review Committee), aligning with the Declaration of Helsinki. The IRB waived the need for obtaining informed consent for this study. No personal identifiers were collected to safeguard privacy, and all data were anonymized before analysis. Data integrity was maintained through secure protection systems that were compliant with international standards for research. Since it is a piece of recorded information, informed consent from the participants is not feasible, and the consent is from the authorities where the data was collected.

Figures

Fig. 1
Fig. 1
Class distribution before and after SMOTE.
Fig. 2
Fig. 2
Correlation matrix between variables.
Fig. 3
Fig. 3
VIFs for predictor variables.
Fig. 4
Fig. 4
Workflow diagram.
Fig. 5
Fig. 5
ROC-AUC curve plot for trained models before SMOTE.
Fig. 6
Fig. 6
ROC-AUC curve plot for trained models after SMOTE.
Fig. 7
Fig. 7
Comparisons of model performance metrics.
Fig. 8
Fig. 8
Results of the tuned random forest model.
Fig. 9
Fig. 9
Feature Importance by the random forest method.
Fig. 10
Fig. 10
SHAP interaction value.

Similar articles

References

    1. Geduld, H., Sinclair, M., Steyn, E. & Chu, K. Road traffic injuries in South africa: A complex global health crisis. Ann. Glob. Health ;90(1) (2024). - PMC - PubMed
    1. Ahmed, S. K. et al. Road traffic accidental injuries and deaths: A neglected global health issue. Health Sci. Rep. ;6 (5). (2023). - PMC - PubMed
    1. Alemayehu, M., Woldemeskel, A., Olani, A. B. & Bekelcho, T. Epidemiological characteristics of deaths from road traffic accidents in addis ababa, ethiopia: A study based on traffic Police records (2018–2020). BMC Emerg. Med.23 (1). (2023). - PMC - PubMed
    1. Bachani, A. M. et al. Road traffic injuries. In Injury Prevention and Environmental Health. 3rd edition (Mock, C. N. et al., ed). The International Bank for Reconstruction and Development/The World Bank. 10.1596/978-1-4648-0522-6_ch3 (2017). - PubMed
    1. Mekonnen, T., Tesfaye, Y., Moges, H. & Berhe, R. Factors associated with risky driving behaviors for road traffic crashes among professional car drivers in Bahirdar city, Northwest ethiopia, 2016: A cross-sectional study. Environ. Health Prev. Med.24 (2019). - PMC - PubMed

LinkOut - more resources