. 2023 Feb:84:418-434.

doi: 10.1016/j.jsr.2022.12.005. Epub 2022 Dec 14.

Heterogeneous ensemble learning for enhanced crash forecasts - A frequentist and machine learning based stacking framework

Numan Ahmad¹, Behram Wali², Asad J Khattak³

Affiliations

¹ Department of Civil & Environmental Engineering, The University of Tennessee, Knoxville, TN 37996, USA. Electronic address: nahmad1@vols.utk.edu.
² Urban Design 4 Health, Inc., 24 Jackie Circle, East Rochester, NY 14612, USA. Electronic address: bwali@ud4h.com.
³ University of Tennessee, Knoxville, TN 37996, USA. Electronic address: akhattak@utk.edu.

PMID: 36868672
DOI: 10.1016/j.jsr.2022.12.005

Heterogeneous ensemble learning for enhanced crash forecasts - A frequentist and machine learning based stacking framework

Numan Ahmad et al. J Safety Res. 2023 Feb.

. 2023 Feb:84:418-434.

doi: 10.1016/j.jsr.2022.12.005. Epub 2022 Dec 14.

Authors

Numan Ahmad¹, Behram Wali², Asad J Khattak³

Affiliations

¹ Department of Civil & Environmental Engineering, The University of Tennessee, Knoxville, TN 37996, USA. Electronic address: nahmad1@vols.utk.edu.
² Urban Design 4 Health, Inc., 24 Jackie Circle, East Rochester, NY 14612, USA. Electronic address: bwali@ud4h.com.
³ University of Tennessee, Knoxville, TN 37996, USA. Electronic address: akhattak@utk.edu.

PMID: 36868672
DOI: 10.1016/j.jsr.2022.12.005

Abstract

Introduction: This study aims to increase the prediction accuracy of crash frequency on roadway segments that can forecast future safety on roadway facilities. A variety of statistical and machine learning (ML) methods are used to model crash frequency with ML methods generally having a higher prediction accuracy. Recently, heterogeneous ensemble methods (HEM), including "stacking," have emerged as more accurate and robust intelligent techniques providing more reliable and accurate predictions.

Methods: This study applies "Stacking" to model crash frequency on five-lane undivided (5 T) segments of urban and suburban arterials. The prediction performance of "Stacking" is compared with parametric statistical models (Poisson and negative binomial) and three state-of-the-art ML techniques (Decision tree, random forest, and gradient boosting), each of which is termed as the base-learner. By employing an optimal weight scheme to combine individual base-learners through stacking, the problem of biased predictions in individual base-learners due to differences in specifications and prediction accuracies is avoided. Data including crash, traffic, and roadway inventory were collected and integrated from 2013 to 2017. The data are split into training (2013-2015), validation (2016), and testing (2017) datasets. After training five individual base-learners using training data, prediction outcomes are obtained for the five base-learners using validation data that are then used to train a meta-learner.

Results: Results of statistical models reveal that crashes increase with the density (number per mile) of commercial driveways whereas decrease with average offset distance to fixed objects. Individual ML methods show similar results - in terms of variable importance. A comparison of out-of-sample predictions of various models or methods confirms the superiority of "Stacking" over the alternative methods considered.

Conclusions and practical applications: From a practical standpoint, "stacking" can enhance prediction accuracy (compared to only one base-learner with a particular specification). When applied systemically, stacking can help identify more appropriate countermeasures.

Keywords: Base-learners; Count data models; Crash frequency; Crash prediction; Machine learning; Meta-learner; Stacking.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Cited by

Predicting the incidence of infectious diarrhea with symptom surveillance data using a stacking-based ensembled model.
Wang P, Zhang W, Wang H, Shi C, Li Z, Wang D, Luo L, Du Z, Hao Y. Wang P, et al. BMC Infect Dis. 2024 Feb 26;24(1):265. doi: 10.1186/s12879-024-09138-x. BMC Infect Dis. 2024. PMID: 38408967 Free PMC article.
Crash severity analysis: A data-enhanced double layer stacking model using semantic understanding.
Yang D, Dong T, Wang P. Yang D, et al. Heliyon. 2024 Apr 29;10(9):e30117. doi: 10.1016/j.heliyon.2024.e30117. eCollection 2024 May 15. Heliyon. 2024. PMID: 38765089 Free PMC article.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Elsevier Science

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Heterogeneous ensemble learning for enhanced crash forecasts - A frequentist and machine learning based stacking framework

Affiliations

Heterogeneous ensemble learning for enhanced crash forecasts - A frequentist and machine learning based stacking framework

Authors

Affiliations

Abstract

Conflict of interest statement

Similar articles

Cited by

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Similar articles

Cited by

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources