Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 6;8(1):340.
doi: 10.1038/s41746-025-01703-1.

Machine learning to predict penumbra core mismatch in acute ischemic stroke using clinical note data

Affiliations

Machine learning to predict penumbra core mismatch in acute ischemic stroke using clinical note data

Shaun Kohli et al. NPJ Digit Med. .

Abstract

In acute ischemic stroke due to large-vessel occlusion (AIS-LVO), late-window endovascular thrombectomy (EVT) decisions depend on penumbra-to-core (P:C) mismatch from computed tomographic perfusion (CTP). We developed multiple machine learning (ML) models to predict P:C ratios from a retrospectively-identified cohort of AIS-LVO patients who underwent CTP within 30 min of initial neuroimaging, using non-imaging electronic health record (EHR) data available prior to CTP evaluation. We extracted structured data and free-text clinical notes from the EHR, generating document embeddings as sums of BioWordVec vectors weighted by term-frequency-inverse-document-frequency scores. We identified 120 patients; an extreme-gradient-boosting model classified P:C ratios as ≥ or <1.8, achieving an AUROC of 0.80 (95% CI 0.57-0.92) with optimal performance using text limited to 500 characters. Sensitivity was 0.80, specificity 0.66, and F1 score 0.86. Our findings suggest that ML models leveraging real-world non-imaging data can potentially aid LVO-AIS triage, though further validation is needed.

PubMed Disclaimer

Conflict of interest statement

Competing interests: G.N. serves as an Associate Editor of NPJ Digital Medicine. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Performance of XGBoost models in predicting penumbra-to-core ratio >= 1.8 across different text-cutoff thresholds.
Receiver-operating characteristic (ROC) curves for models trained using structured features only (red), document embeddings only (green), and both structured features and document embeddings (blue). Panels (a), (b), and (c) correspond to models trained with text data generated with cutoffs of 500, 1000, and 5000 characters, respectively. The dashed line represents the performance of a random classifier (AUROC = 0.5).
Fig. 2
Fig. 2. Average confusion matrices for the full model using decision thresholds that maximize Youden’s index across different text-cutoff thresholds.
Confusion matrices for the full model are presented for three different text-cutoff thresholds (500, 1000, and 5000 characters) in panels (a), (b), and (c), respectively. For each cutoff, the optimal classification threshold was determined by maximizing Youden’s index (i.e., maximizing the sum of sensitivity and specificity, which corresponds to the sum of the row-normalized diagonal elements). Each cell in the matrices represents the proportion of cases for the true class (expressed as a percentage), with the axes labeled “P:C < 1.8” and “P:C ≥ 1.8” indicating the binary classification outcome for the penumbra-to-core ratio.
Fig. 3
Fig. 3. Text processing pipeline for generating document embeddings.
Flowchart illustrating the five-step pipeline for constructing document embeddings. The process begins with selecting clinical notes based on a predefined character cutoff threshold, followed by text preprocessing. Next, term frequency-inverse document frequency (TF-IDF) weighting is applied to each participant’s text corpus (“patient-level corpora”). Preprocessed text is then mapped to word embeddings using BioWord2Vec, and a final document-level embedding is obtained by matrix-multiplying the TF-IDF matrix with the word embedding matrix.

References

    1. Powers, W. J. et al. Guidelines for the Early Management of Patients With Acute Ischemic Stroke: 2019 Update to the 2018 Guidelines for the Early Management of Acute Ischemic Stroke: A Guideline for Healthcare Professionals From the American Heart Association/American Stroke Association. Stroke50, e344–e418, 10.1161/STR.0000000000000211 (2019). - PubMed
    1. Saver, J. L. et al. Solitaire™ with the Intention for Thrombectomy as Primary Endovascular Treatment for Acute Ischemic Stroke (SWIFT PRIME) trial: protocol for a randomized, controlled, multicenter study comparing the Solitaire revascularization device with IV tPA with IV tPA alone in acute ischemic stroke. Int J. Stroke10, 439–448 (2015). - PMC - PubMed
    1. Albers, G. W. et al. Thrombectomy for Stroke at 6 to 16 h with Selection by Perfusion Imaging. N. Engl. J. Med.378, 708–718 (2018). - PMC - PubMed
    1. Huo, X. et al. Trial of endovascular therapy for acute ischemic stroke with large infarct. N. Engl. J. Med.388, 1272–1283 (2023). - PubMed
    1. Sarraj, A. et al. Trial of endovascular thrombectomy for large ischemic strokes. N. Engl. J. Med.388, 1259–1271 (2023). - PubMed

LinkOut - more resources