. 2021 Mar 20;10(6):1286.

doi: 10.3390/jcm10061286.

Prediction of Long-Term Stroke Recurrence Using Machine Learning Models

Vida Abedi^{1

2}, Venkatesh Avula¹, Durgesh Chaudhary³, Shima Shahjouei³, Ayesha Khan³, Christoph J Griessenauer^{3

4}, Jiang Li¹, Ramin Zand³

Affiliations

¹ Department of Molecular and Functional Genomics, Geisinger Health System, Danville, PA 17822, USA.
² Biocomplexity Institute, Virginia Tech, Blacksburg, VA 24061, USA.
³ Geisinger Neuroscience Institute, Geisinger Health System, Danville, PA 17822, USA.
⁴ Research Institute of Neurointervention, Paracelsus Medical University, 5020 Salzburg, Austria.

PMID: 33804724
PMCID: PMC8003970
DOI: 10.3390/jcm10061286

Prediction of Long-Term Stroke Recurrence Using Machine Learning Models

Vida Abedi et al. J Clin Med. 2021.

. 2021 Mar 20;10(6):1286.

doi: 10.3390/jcm10061286.

Authors

Vida Abedi^{1

2}, Venkatesh Avula¹, Durgesh Chaudhary³, Shima Shahjouei³, Ayesha Khan³, Christoph J Griessenauer^{3

4}, Jiang Li¹, Ramin Zand³

Affiliations

¹ Department of Molecular and Functional Genomics, Geisinger Health System, Danville, PA 17822, USA.
² Biocomplexity Institute, Virginia Tech, Blacksburg, VA 24061, USA.
³ Geisinger Neuroscience Institute, Geisinger Health System, Danville, PA 17822, USA.
⁴ Research Institute of Neurointervention, Paracelsus Medical University, 5020 Salzburg, Austria.

PMID: 33804724
PMCID: PMC8003970
DOI: 10.3390/jcm10061286

Abstract

Background: The long-term risk of recurrent ischemic stroke, estimated to be between 17% and 30%, cannot be reliably assessed at an individual level. Our goal was to study whether machine-learning can be trained to predict stroke recurrence and identify key clinical variables and assess whether performance metrics can be optimized.

Methods: We used patient-level data from electronic health records, six interpretable algorithms (Logistic Regression, Extreme Gradient Boosting, Gradient Boosting Machine, Random Forest, Support Vector Machine, Decision Tree), four feature selection strategies, five prediction windows, and two sampling strategies to develop 288 models for up to 5-year stroke recurrence prediction. We further identified important clinical features and different optimization strategies.

Results: We included 2091 ischemic stroke patients. Model area under the receiver operating characteristic (AUROC) curve was stable for prediction windows of 1, 2, 3, 4, and 5 years, with the highest score for the 1-year (0.79) and the lowest score for the 5-year prediction window (0.69). A total of 21 (7%) models reached an AUROC above 0.73 while 110 (38%) models reached an AUROC greater than 0.7. Among the 53 features analyzed, age, body mass index, and laboratory-based features (such as high-density lipoprotein, hemoglobin A1c, and creatinine) had the highest overall importance scores. The balance between specificity and sensitivity improved through sampling strategies.

Conclusion: All of the selected six algorithms could be trained to predict the long-term stroke recurrence and laboratory-based variables were highly associated with stroke recurrence. The latter could be targeted for personalized interventions. Model performance metrics could be optimized, and models can be implemented in the same healthcare system as intelligent decision support for targeted intervention.

Keywords: artificial intelligence; clinical decision support system; electronic health record; explainable machine learning; healthcare; interpretable machine learning; ischemic stroke; machine learning; outcome prediction; recurrent stroke.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
(A) Flow-chart of inclusion-exclusion of subjects in cases and control group in the study. Patients in the control group had available records in the electronic health record for at least 5 years and no documented stroke recurrence within 5 years. Distribution panel shows the number of recurrences over time. At 24 days, the number of recurrent cases can be seen to approach a plateau. (B) The design strategy for predicting stroke recurrence using electronic health records (EHR), Geisinger Quality database as well as Social Security Death database.

**Figure 2**
Model performance summaries for the five different prediction windows, six different classifiers, and four feature selection approaches. Performance metrics for (A–F) Decision tree, (G–L) Gradient Boost, (M–R) Logistic Regression, (S–X) Random Forest, (Y–AD) SVM, and (AE–AJ) XGBoost.

**Figure 3**
Area under the receiver operating characteristic (AROC) curve using six classifiers for the 1-year prediction window. The feature Set 3 is used for this figure. (A) Model without sampling; (B) Model with up-sampling at a 1:2 ratio; (C) Model with up-sampling at a 1:1 ratio. The best performer model (AUROC of 0.79) is when up-sampling is used with Random Forest algorithm (panel B).

**Figure 4**
Feature importance based on the different trained models. (A–E) Six different classifiers (Gradient Boost, Random Forest, Extreme Gradient Boosting (XGBoost), Decision Trees, Support Vector Machine (SVM), and Logistic Regression) and five different prediction windows were used. (F) Average feature importance score across the different models and prediction windows.

**Figure 5**
Model Performance summaries with sampling-based optimization for the 1 and 3-year prediction window. Up-sampling using was performed using the Synthetic Minority Over-sampling Technique (SMOTE). The feature Set 3 is used for this figure. (A–F) Model without sampling; (G–L) Model with down-sampling; (M–R) Model with up-sampling.

See this image and copyright information in PMC

Cited by

OptiSelect and EnShap: Integrating machine learning and game theory for ischemic stroke prediction.
Chakraborty P, Bandyopadhyay A, Parui S, Swain S, Banerjee PS, Si T, Qin H, Mallik S. Chakraborty P, et al. PLoS One. 2025 Aug 13;20(8):e0328967. doi: 10.1371/journal.pone.0328967. eCollection 2025. PLoS One. 2025. PMID: 40802707 Free PMC article.
Longitudinal Data to Enhance Dynamic Stroke Risk Prediction.
Zheng W, Chen YH, Sawan M. Zheng W, et al. Healthcare (Basel). 2022 Oct 27;10(11):2134. doi: 10.3390/healthcare10112134. Healthcare (Basel). 2022. PMID: 36360476 Free PMC article.
Imputation of missing values for electronic health record laboratory data.
Li J, Yan XS, Chaudhary D, Avula V, Mudiganti S, Husby H, Shahjouei S, Afshar A, Stewart WF, Yeasin M, Zand R, Abedi V. Li J, et al. NPJ Digit Med. 2021 Oct 11;4(1):147. doi: 10.1038/s41746-021-00518-0. NPJ Digit Med. 2021. PMID: 34635760 Free PMC article.
A machine learning model for visualization and dynamic clinical prediction of stroke recurrence in acute ischemic stroke patients: A real-world retrospective study.
Wang K, Shi Q, Sun C, Liu W, Yau V, Xu C, Liu H, Sun C, Yin C, Wei X, Li W, Rong L. Wang K, et al. Front Neurosci. 2023 Mar 27;17:1130831. doi: 10.3389/fnins.2023.1130831. eCollection 2023. Front Neurosci. 2023. PMID: 37051146 Free PMC article.
An integrated pipeline for prediction of Clostridioides difficile infection.
Li J, Chaudhary D, Sharma V, Sharma V, Avula V, Ssentongo P, Wolk DM, Zand R, Abedi V. Li J, et al. Sci Rep. 2023 Oct 2;13(1):16532. doi: 10.1038/s41598-023-41753-7. Sci Rep. 2023. PMID: 37783691 Free PMC article.

See all "Cited by" articles

References

1. Katan M., Luft A. Global Burden of Stroke. Semin. Neurol. 2018;38:208–211. doi: 10.1055/s-0038-1649503. - DOI - PubMed
1. Benjamin E.J., Blaha M.J., Chiuve S.E., Cushman M., Das S.R., de Ferranti S.D., Floyd J., Fornage M., Gillespie C., Isasi C.R., et al. Heart disease and stroke statistics—2017 update a report from the American heart association. Circulation. 2017;135:e146–e603. doi: 10.1161/CIR.0000000000000485. - DOI - PMC - PubMed
1. Burn J., Dennis M., Bamford J., Sandercock P., Wade D., Warlow C. Long-term risk of recurrent stroke after a first-ever stroke. The Oxfordshire Community Stroke Project. Stroke. 1994;25:333–337. doi: 10.1161/01.STR.25.2.333. - DOI - PubMed
1. Hillen T., Coshall C., Tilling K., Rudd A.G., McGovern R., Wolfe C.D. Cause of Stroke Recurrence Is Multifactorial. Stroke. 2003;34:1457–1463. doi: 10.1161/01.STR.0000072985.24967.7F. - DOI - PubMed
1. Samsa G.P., Bian J., Lipscomb J., Matchar D.B. Epidemiology of Recurrent Cerebral Infarction. Stroke. 1999;30:338–349. doi: 10.1161/01.STR.30.2.338. - DOI - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Prediction of Long-Term Stroke Recurrence Using Machine Learning Models

Affiliations

Prediction of Long-Term Stroke Recurrence Using Machine Learning Models

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources