Crash injury severity prediction considering data imbalance: A Wasserstein generative adversarial network with gradient penalty approach

Ye Li¹, Zhanhao Yang², Lu Xing³, Chen Yuan⁴, Fei Liu⁵, Dan Wu⁶, Haifei Yang⁷

Affiliations

¹ School of Traffic and Transportation Engineering, Central South University, Changsha, Hunan 410075, China; Hunan Key Laboratory of Smart Roadway and Cooperative Vehicle-Infrastructure Systems, Changsha University of Science & Technology, Changsha, 410114 Hunan, China. Electronic address: yelicsu@csu.edu.cn.
² School of Traffic and Transportation Engineering, Central South University, Changsha, Hunan 410075, China. Electronic address: yangzhanhao1229@163.com.
³ School of Traffic and Transportation Engineering, Changsha University of Science and Technology, Changsha, Hunan 410114, China. Electronic address: luxing@csust.edu.cn.
⁴ School of Traffic and Transportation Engineering, Central South University, Changsha, Hunan 410075, China; Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China. Electronic address: yuanchen@csu.edu.cn.
⁵ School of Traffic and Transportation Engineering, Central South University, Changsha, Hunan 410075, China. Electronic address: LF1102@csu.edu.cn.
⁶ School of Traffic and Transportation Engineering, Central South University, Changsha, Hunan 410075, China. Electronic address: danwucsu@163.com.
⁷ School of Civil and Transportation Engineering, Hohai University, Nanjing, Jiangsu 210098, China. Electronic address: yanghaifei@hhu.edu.cn.

PMID: 37659275
DOI: 10.1016/j.aap.2023.107271

Crash injury severity prediction considering data imbalance: A Wasserstein generative adversarial network with gradient penalty approach

Ye Li et al. Accid Anal Prev. 2023 Nov.

. 2023 Nov:192:107271.

doi: 10.1016/j.aap.2023.107271. Epub 2023 Aug 31.

Authors

Ye Li¹, Zhanhao Yang², Lu Xing³, Chen Yuan⁴, Fei Liu⁵, Dan Wu⁶, Haifei Yang⁷

Affiliations

¹ School of Traffic and Transportation Engineering, Central South University, Changsha, Hunan 410075, China; Hunan Key Laboratory of Smart Roadway and Cooperative Vehicle-Infrastructure Systems, Changsha University of Science & Technology, Changsha, 410114 Hunan, China. Electronic address: yelicsu@csu.edu.cn.
² School of Traffic and Transportation Engineering, Central South University, Changsha, Hunan 410075, China. Electronic address: yangzhanhao1229@163.com.
³ School of Traffic and Transportation Engineering, Changsha University of Science and Technology, Changsha, Hunan 410114, China. Electronic address: luxing@csust.edu.cn.
⁴ School of Traffic and Transportation Engineering, Central South University, Changsha, Hunan 410075, China; Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China. Electronic address: yuanchen@csu.edu.cn.
⁵ School of Traffic and Transportation Engineering, Central South University, Changsha, Hunan 410075, China. Electronic address: LF1102@csu.edu.cn.
⁶ School of Traffic and Transportation Engineering, Central South University, Changsha, Hunan 410075, China. Electronic address: danwucsu@163.com.
⁷ School of Civil and Transportation Engineering, Hohai University, Nanjing, Jiangsu 210098, China. Electronic address: yanghaifei@hhu.edu.cn.

PMID: 37659275
DOI: 10.1016/j.aap.2023.107271

Abstract

For each road crash event, it is necessary to predict its injury severity. However, predicting crash injury severity with the imbalanced data frequently results in ineffective classifier. Due to the rarity of severe injuries in road traffic crashes, the crash data is extremely imbalanced among injury severity classes, making it challenging to the training of prediction models. To achieve interclass balance, it is possible to generate certain minority class samples using data augmentation techniques. Aiming to address the imbalance issue of crash injury severity data, this study applies a novel deep learning method, the Wasserstein generative adversarial network with gradient penalty (WGAN-GP), to investigate a massive amount of crash data, which can generate synthetic injury severity data linked to traffic crashes to rebalance the dataset. To evaluate the effectiveness of the WGAN-GP model, we systematically compare performances of various commonly-used sampling techniques (random under-sampling, random over-sampling, synthetic minority over-sampling technique and adaptive synthetic sampling) with respect to dataset balance and crash injury severity prediction. After rebalancing the dataset, this study categorizes the crash injury severity using logistic regression, multilayer perceptron, random forest, AdaBoost and XGBoost. The AUC, specificity and sensitivity are employed as evaluation indicators to compare the prediction performances. Results demonstrate that sampling techniques can considerably improve the prediction performance of minority classes in an imbalanced dataset, and the combination of XGBoost and WGAN-GP performs best with an AUC of 0.794 and a sensitivity of 0.698. Finally, the interpretability of the model is improved by the explainable machine learning technique SHAP (SHapley Additive exPlanation), allowing for a deeper understanding of the effects of each variable on crash injury severity. Findings of this study shed light on the prediction of crash injury severity with data imbalance using data-driven approaches.

Keywords: Crash injury severity; Generative adversarial network; Imbalanced data; Sampling technique.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Cited by

An interpretable dynamic ensemble selection multiclass imbalance approach with ensemble imbalance learning for predicting road crash injury severity.
Aziz K, Chen F, Ahmad M, Khan MS, Sabri Sabri MM, Almujibah H. Aziz K, et al. Sci Rep. 2025 Jul 9;15(1):24666. doi: 10.1038/s41598-025-08935-x. Sci Rep. 2025. PMID: 40634494 Free PMC article.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- Elsevier Science

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Crash injury severity prediction considering data imbalance: A Wasserstein generative adversarial network with gradient penalty approach

Affiliations

Crash injury severity prediction considering data imbalance: A Wasserstein generative adversarial network with gradient penalty approach

Authors

Affiliations

Abstract

Conflict of interest statement

Similar articles

Cited by

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Similar articles

Cited by

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources