Predictive etiological classification of acute ischemic stroke through interpretable machine learning algorithms: a multicenter, prospective cohort study
- PMID: 39256656
- PMCID: PMC11384709
- DOI: 10.1186/s12874-024-02331-1
Predictive etiological classification of acute ischemic stroke through interpretable machine learning algorithms: a multicenter, prospective cohort study
Abstract
Background: The prognosis, recurrence rates, and secondary prevention strategies varied significantly among different subtypes of acute ischemic stroke (AIS). Machine learning (ML) techniques can uncover intricate, non-linear relationships within medical data, enabling the identification of factors associated with etiological classification. However, there is currently a lack of research utilizing ML algorithms for predicting AIS etiology.
Objective: We aimed to use interpretable ML algorithms to develop AIS etiology prediction models, identify critical factors in etiology classification, and enhance existing clinical categorization.
Methods: This study involved patients with the Third China National Stroke Registry (CNSR-III). Nine models, which included Natural Gradient Boosting (NGBoost), Categorical Boosting (CatBoost), Extreme Gradient Boosting (XGBoost), Random Forest (RF), Light Gradient Boosting Machine (LGBM), Gradient Boosting Decision Tree (GBDT), Adaptive Boosting (AdaBoost), Support Vector Machine (SVM), and logistic regression (LR), were employed to predict large artery atherosclerosis (LAA), small vessel occlusion (SVO), and cardioembolism (CE) using an 80:20 randomly split training and test set. We designed an SFS-XGB with 10-fold cross-validation for feature selection. The primary evaluation metrics for the models included the area under the receiver operating characteristic curve (AUC) for discrimination and the Brier score (or calibration plots) for calibration.
Results: A total of 5,213 patients were included, comprising 2,471 (47.4%) with LAA, 2,153 (41.3%) with SVO, and 589 (11.3%) with CE. In both LAA and SVO models, the AUC values of the ML models were significantly higher than that of the LR model (P < 0.001). The optimal model for predicting SVO (AUC [RF model] = 0.932) outperformed the optimal LAA model (AUC [NGB model] = 0.917) and the optimal CE model (AUC [LGBM model] = 0.846). Each model displayed relatively satisfactory calibration. Further analysis showed that the optimal CE model could identify potential CE patients in the undetermined etiology (SUE) group, accounting for 1,900 out of 4,156 (45.7%).
Conclusions: The ML algorithm effectively classified patients with LAA, SVO, and CE, demonstrating superior classification performance compared to the LR model. The optimal ML model can identify potential CE patients among SUE patients. These newly identified predictive factors may complement the existing etiological classification system, enabling clinicians to promptly categorize stroke patients' etiology and initiate optimal strategies for secondary prevention.
Keywords: Acute ischemic stroke; Clinical prediction; Etiological classification; Machine learning; Prospective cohort study.
© 2024. The Author(s).
Conflict of interest statement
The authors declare no competing interests.
Figures





Similar articles
-
[Constructing a predictive model for the death risk of patients with septic shock based on supervised machine learning algorithms].Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2024 Apr;36(4):345-352. doi: 10.3760/cma.j.cn121430-20230930-00832. Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2024. PMID: 38813626 Chinese.
-
Machine learning is an effective method to predict the 90-day prognosis of patients with transient ischemic attack and minor stroke.BMC Med Res Methodol. 2022 Jul 16;22(1):195. doi: 10.1186/s12874-022-01672-z. BMC Med Res Methodol. 2022. PMID: 35842606 Free PMC article.
-
Causative Classification of Ischemic Stroke by the Machine Learning Algorithm Random Forests.Front Aging Neurosci. 2022 Apr 15;14:788637. doi: 10.3389/fnagi.2022.788637. eCollection 2022. Front Aging Neurosci. 2022. PMID: 35493925 Free PMC article.
-
Predicting adverse drug event using machine learning based on electronic health records: a systematic review and meta-analysis.Front Pharmacol. 2024 Nov 13;15:1497397. doi: 10.3389/fphar.2024.1497397. eCollection 2024. Front Pharmacol. 2024. PMID: 39605909 Free PMC article.
-
A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models.J Clin Epidemiol. 2019 Jun;110:12-22. doi: 10.1016/j.jclinepi.2019.02.004. Epub 2019 Feb 11. J Clin Epidemiol. 2019. PMID: 30763612
References
-
- Roth GA, Abate D, Abate KH, Abay SM, Abbafati C, Abbasi N, et al. Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980–2017: a systematic analysis for the global burden of Disease Study 2017. Lancet. 2018;392:1736–88. 10.1016/S0140-6736(18)32203-7 - DOI - PMC - PubMed
-
- Wang Y, Jing J, Meng X, Pan Y, Wang Y, Zhao X, et al. The third China National Stroke Registry (CNSR-III) for patients with acute ischaemic stroke or transient ischaemic attack: design, rationale and baseline patient characteristics. Stroke Vasc Neurol. 2019;4:158–64. 10.1136/svn-2019-000242 - DOI - PMC - PubMed
-
- Wang Y-J, Li Z-X, Gu H-Q, Zhai Y, Jiang Y, Zhao X-Q, the National Center for Healthcare Quality Management in Neurological Diseases, China National Clinical Research Center for Neurological Diseases, the Chinese Stroke Association. Stroke Vasc Neurol. 2020;5:211–39. National Center for Chronic and Non-communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention and Institute for Global Neuroscience and Stroke CollaborationsChina Stroke Statistics 2019: A Report From. - PMC - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical