Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 28;15(1):27468.
doi: 10.1038/s41598-025-10970-7.

AI-based prediction of traffic crash severity for improving road safety and transportation efficiency

Affiliations

AI-based prediction of traffic crash severity for improving road safety and transportation efficiency

Ayman Mohamed Mostafa et al. Sci Rep. .

Abstract

Ensuring safe transportation requires a comprehensive understanding of driving behaviors and road safety to mitigate traffic crashes, reduce risks and enhance mobility. This study introduces an AI-driven machine learning (ML) framework for traffic crash severity prediction, utilizing a large-scale dataset of over 2.26 million records. By integrating human, crash-specific, and vehicle-related factors, the model improves predictive accuracy and reliability. The methodology incorporates feature engineering, clustering techniques such as K-Means and HDBSCAN, with oversampling methods such as RandomOverSampler, SMOTE, Borderline-SMOTE, and ADASYN to address class imbalance, along with Correlation-Based Feature Selection (CFS) and Recursive Feature Elimination (RFE) for optimal feature selection. Among the evaluated classifiers, the Extra Trees (ET Classifier) ensemble model demonstrated superior performance, achieving 96.19% accuracy and an F1-score (macro) of 95.28%, ensuring a well-balanced prediction system. The proposed framework provides a scalable, AI-powered solution for traffic safety, offering actionable insights for intelligent transportation systems (ITS) and accident prevention strategies. By leveraging advanced ML and feature selection techniques, this approach enhances traffic risk assessment and enables data-driven decision-making.

Keywords: Class imbalance; Feature engineering; Feature selection; ML models; Oversampling; Traffic crash prediction.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Proposed traffic data analysis framework.
Fig. 2
Fig. 2
Traffic dataset description.
Fig. 3
Fig. 3
Confusion matrix and ROC-curve for ET classifier in experiment 1.
Fig. 4
Fig. 4
Confusion matrix and ROC-curve for ET classifier in experiment 2.
Fig. 5
Fig. 5
Pipeline for the triple merge dataset.
Fig. 6
Fig. 6
Confusion matrix and ROC-curve for ET classifier in experiment 3.
Fig. 7
Fig. 7
Illinois map with crash locations with weather forecasting.
Fig. 8
Fig. 8
Confusion matrix and ROC-curve for ET classifier after feature generation.
Fig. 9
Fig. 9
Confusion matrixes for (a) K-mean clustering with ET (b) HDBSCAN clustering with ET.
Fig. 10
Fig. 10
Confusion matrixes for oversampling techniques on triple merge dataset.
Fig. 11
Fig. 11
Performance curve of feature selection sets with accuracy.
Fig. 12
Fig. 12
Confusion matrix for the best performing feature set.

References

    1. Ashqar, H. I., Alhadidi, T. I., Elhenawy, M. & Jaradat, S. Factors affecting crash severity in roundabouts: A comprehensive analysis in the Jordanian context. Transp. Eng.17, 100261. 10.1016/j.treng.2024.100261 (2024). - DOI
    1. Champahom, T. et al. Tree-based approaches to understanding factors influencing crash severity across roadway classes: A Thailand case study. IATSS Res.48, 464–476. 10.1016/j.iatssr.2024.09.001 (2024). - DOI
    1. Kumar, P., Jain, J. K. & Singh, G. Analysing crash severity on expressways in India: Statistical and machine learning models. Proc. Institut. Civil Eng. – Transp.10.1680/jtran.24.00071 (2025). - DOI
    1. Das, S., Das, C., Sarma, R., Talukdar, P., Barman, A. & Hubballi, R. Machine learning based approach for predicting the impact of time of day on traffic accidents. In Proceedings of the 2023 26th international conference on computer and information technology (ICCIT), 13–15 Dec. 2023, pp. 1–5 (2023).
    1. Goswami, N.G., Sharma, P., Arora, A. & Singh, N. Traffic improvisation by identifying the accident locations using ML/Data mining approaches. In Proceedings of the 2022 4th international conference on advances in computing, communication control and networking (ICAC3N), 16–17 Dec. 2022, pp. 127–132 (2022).

LinkOut - more resources