Machine learning to predict penumbra core mismatch in acute ischemic stroke using clinical note data

Shaun Kohli¹, Parul Agarwal^{2

3}, Andy Ho Wing Chan², Asala Erekat^{2

4}, Girish Nadkarni^{5

6}, Benjamin Kummer^{7

8

9}

Affiliations

¹ Icahn School of Medicine at Mount Sinai, New York, NY, USA.
² Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
³ Institute for Health Care Delivery Science, Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁴ Clinical Neuro-informatics Center, Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁵ Windreich Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁶ Division of Data and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁷ Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA. benjamin.kummer@mountsinai.org.
⁸ Clinical Neuro-informatics Center, Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA. benjamin.kummer@mountsinai.org.
⁹ Windreich Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA. benjamin.kummer@mountsinai.org.

PMID: 40481318
PMCID: PMC12144192
DOI: 10.1038/s41746-025-01703-1

Machine learning to predict penumbra core mismatch in acute ischemic stroke using clinical note data

Shaun Kohli et al. NPJ Digit Med. 2025.

. 2025 Jun 6;8(1):340.

doi: 10.1038/s41746-025-01703-1.

Authors

Shaun Kohli¹, Parul Agarwal^{2

3}, Andy Ho Wing Chan², Asala Erekat^{2

4}, Girish Nadkarni^{5

6}, Benjamin Kummer^{7

8

9}

Affiliations

¹ Icahn School of Medicine at Mount Sinai, New York, NY, USA.
² Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
³ Institute for Health Care Delivery Science, Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁴ Clinical Neuro-informatics Center, Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁵ Windreich Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁶ Division of Data and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁷ Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA. benjamin.kummer@mountsinai.org.
⁸ Clinical Neuro-informatics Center, Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA. benjamin.kummer@mountsinai.org.
⁹ Windreich Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA. benjamin.kummer@mountsinai.org.

PMID: 40481318
PMCID: PMC12144192
DOI: 10.1038/s41746-025-01703-1

Abstract

In acute ischemic stroke due to large-vessel occlusion (AIS-LVO), late-window endovascular thrombectomy (EVT) decisions depend on penumbra-to-core (P:C) mismatch from computed tomographic perfusion (CTP). We developed multiple machine learning (ML) models to predict P:C ratios from a retrospectively-identified cohort of AIS-LVO patients who underwent CTP within 30 min of initial neuroimaging, using non-imaging electronic health record (EHR) data available prior to CTP evaluation. We extracted structured data and free-text clinical notes from the EHR, generating document embeddings as sums of BioWordVec vectors weighted by term-frequency-inverse-document-frequency scores. We identified 120 patients; an extreme-gradient-boosting model classified P:C ratios as ≥ or <1.8, achieving an AUROC of 0.80 (95% CI 0.57-0.92) with optimal performance using text limited to 500 characters. Sensitivity was 0.80, specificity 0.66, and F1 score 0.86. Our findings suggest that ML models leveraging real-world non-imaging data can potentially aid LVO-AIS triage, though further validation is needed.

PubMed Disclaimer

Conflict of interest statement

Competing interests: G.N. serves as an Associate Editor of NPJ Digital Medicine. The remaining authors declare no competing interests.

Figures

**Fig. 1. Performance of XGBoost models in predicting penumbra-to-core ratio >= 1.8 across different text-cutoff thresholds.**
Receiver-operating characteristic (ROC) curves for models trained using structured features only (red), document embeddings only (green), and both structured features and document embeddings (blue). Panels (a), (b), and (c) correspond to models trained with text data generated with cutoffs of 500, 1000, and 5000 characters, respectively. The dashed line represents the performance of a random classifier (AUROC = 0.5).

**Fig. 2. Average confusion matrices for the full model using decision thresholds that maximize Youden’s index across different text-cutoff thresholds.**
Confusion matrices for the full model are presented for three different text-cutoff thresholds (500, 1000, and 5000 characters) in panels (a), (b), and (c), respectively. For each cutoff, the optimal classification threshold was determined by maximizing Youden’s index (i.e., maximizing the sum of sensitivity and specificity, which corresponds to the sum of the row-normalized diagonal elements). Each cell in the matrices represents the proportion of cases for the true class (expressed as a percentage), with the axes labeled “P:C < 1.8” and “P:C ≥ 1.8” indicating the binary classification outcome for the penumbra-to-core ratio.

**Fig. 3. Text processing pipeline for generating document embeddings.**
Flowchart illustrating the five-step pipeline for constructing document embeddings. The process begins with selecting clinical notes based on a predefined character cutoff threshold, followed by text preprocessing. Next, term frequency-inverse document frequency (TF-IDF) weighting is applied to each participant’s text corpus (“patient-level corpora”). Preprocessed text is then mapped to word embeddings using BioWord2Vec, and a final document-level embedding is obtained by matrix-multiplying the TF-IDF matrix with the word embedding matrix.

See this image and copyright information in PMC

References

1. Powers, W. J. et al. Guidelines for the Early Management of Patients With Acute Ischemic Stroke: 2019 Update to the 2018 Guidelines for the Early Management of Acute Ischemic Stroke: A Guideline for Healthcare Professionals From the American Heart Association/American Stroke Association. Stroke50, e344–e418, 10.1161/STR.0000000000000211 (2019). - PubMed
1. Saver, J. L. et al. Solitaire™ with the Intention for Thrombectomy as Primary Endovascular Treatment for Acute Ischemic Stroke (SWIFT PRIME) trial: protocol for a randomized, controlled, multicenter study comparing the Solitaire revascularization device with IV tPA with IV tPA alone in acute ischemic stroke. Int J. Stroke10, 439–448 (2015). - PMC - PubMed
1. Albers, G. W. et al. Thrombectomy for Stroke at 6 to 16 h with Selection by Perfusion Imaging. N. Engl. J. Med.378, 708–718 (2018). - PMC - PubMed
1. Huo, X. et al. Trial of endovascular therapy for acute ischemic stroke with large infarct. N. Engl. J. Med.388, 1272–1283 (2023). - PubMed
1. Sarraj, A. et al. Trial of endovascular thrombectomy for large ischemic strokes. N. Engl. J. Med.388, 1259–1271 (2023). - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine learning to predict penumbra core mismatch in acute ischemic stroke using clinical note data

Affiliations

Machine learning to predict penumbra core mismatch in acute ischemic stroke using clinical note data

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources