Rapid triage for ischemic stroke: a machine learning-driven approach in the context of predictive, preventive and personalised medicine

Yulu Zheng^#¹, Zheng Guo^#¹, Yanbo Zhang^#², Jianjing Shang³, Leilei Yu⁴, Ping Fu⁵, Yizhi Liu⁶, Xingang Li¹, Hao Wang^{7

8}, Ling Ren⁹, Wei Zhang¹⁰, Haifeng Hou^{1

2

6}, Xuerui Tan¹¹, Wei Wang^{1

6

8

11

12}; Global Health Epidemiology Reference Group (GHERG)

Affiliations

¹ Centre for Precision Health, Edith Cowan University, 270 Joondalup Drive, Joondalup, 6027 Western Australia Australia.
² The Second Affiliated Hospital of Shandong First Medical University, Tai'an, Shandong China.
³ Dongping People's Hospital, Tai'an, Shandong China.
⁴ Tai'an City Central Hospital, Tai'an, Shandong China.
⁵ Ti'men Township Central Hospital, Tai'an, Shandong China.
⁶ School of Public Health, Shandong First Medical University & Shandong Academy of Medical Sciences, 619 Changcheng Road, Tai'an, 271016 Shandong China.
⁷ Department of Clinical Epidemiology and Evidence-Based Medicine, National Clinical Research Centre for Digestive Disease, Beijing Friendship Hospital, Capital Medical University, Beijing, China.
⁸ Beijing Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, Beijing, China.
⁹ Beijing United Family Hospital, No.2 Jiangtai Road, Chaoyang District, Beijing, China.
¹⁰ Centre for Cognitive Neurology, Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China.
¹¹ The First Affiliated Hospital of Shantou University Medical College, Shantou, Guangdong China.
¹² Institute for Nutrition Research, Edith Cowan University, Joondalup, WA Australia.

^# Contributed equally.

PMID: 35719136
PMCID: PMC9203613
DOI: 10.1007/s13167-022-00283-4

Rapid triage for ischemic stroke: a machine learning-driven approach in the context of predictive, preventive and personalised medicine

Yulu Zheng et al. EPMA J. 2022.

. 2022 May 27;13(2):285-298.

doi: 10.1007/s13167-022-00283-4. eCollection 2022 Jun.

Authors

Affiliations

¹ Centre for Precision Health, Edith Cowan University, 270 Joondalup Drive, Joondalup, 6027 Western Australia Australia.
² The Second Affiliated Hospital of Shandong First Medical University, Tai'an, Shandong China.
³ Dongping People's Hospital, Tai'an, Shandong China.
⁴ Tai'an City Central Hospital, Tai'an, Shandong China.
⁵ Ti'men Township Central Hospital, Tai'an, Shandong China.
⁶ School of Public Health, Shandong First Medical University & Shandong Academy of Medical Sciences, 619 Changcheng Road, Tai'an, 271016 Shandong China.
⁷ Department of Clinical Epidemiology and Evidence-Based Medicine, National Clinical Research Centre for Digestive Disease, Beijing Friendship Hospital, Capital Medical University, Beijing, China.
⁸ Beijing Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, Beijing, China.
⁹ Beijing United Family Hospital, No.2 Jiangtai Road, Chaoyang District, Beijing, China.
¹⁰ Centre for Cognitive Neurology, Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China.
¹¹ The First Affiliated Hospital of Shantou University Medical College, Shantou, Guangdong China.
¹² Institute for Nutrition Research, Edith Cowan University, Joondalup, WA Australia.

^# Contributed equally.

PMID: 35719136
PMCID: PMC9203613
DOI: 10.1007/s13167-022-00283-4

Abstract

Background: Recognising the early signs of ischemic stroke (IS) in emergency settings has been challenging. Machine learning (ML), a robust tool for predictive, preventive and personalised medicine (PPPM/3PM), presents a possible solution for this issue and produces accurate predictions for real-time data processing.

Methods: This investigation evaluated 4999 IS patients among a total of 10,476 adults included in the initial dataset, and 1076 IS subjects among 3935 participants in the external validation dataset. Six ML-based models for the prediction of IS were trained on the initial dataset of 10,476 participants (split participants into a training set [80%] and an internal validation set [20%]). Selected clinical laboratory features routinely assessed at admission were used to inform the models. Model performance was mainly evaluated by the area under the receiver operating characteristic (AUC) curve. Additional techniques-permutation feature importance (PFI), local interpretable model-agnostic explanations (LIME), and SHapley Additive exPlanations (SHAP)-were applied for explaining the black-box ML models.

Results: Fifteen routine haematological and biochemical features were selected to establish ML-based models for the prediction of IS. The XGBoost-based model achieved the highest predictive performance, reaching AUCs of 0.91 (0.90-0.92) and 0.92 (0.91-0.93) in the internal and external datasets respectively. PFI globally revealed that demographic feature age, routine haematological parameters, haemoglobin and neutrophil count, and biochemical analytes total protein and high-density lipoprotein cholesterol were more influential on the model's prediction. LIME and SHAP showed similar local feature attribution explanations.

Conclusion: In the context of PPPM/3PM, we used the selected predictors obtained from the results of common blood tests to develop and validate ML-based models for the diagnosis of IS. The XGBoost-based model offers the most accurate prediction. By incorporating the individualised patient profile, this prediction tool is simple and quick to administer. This is promising to support subjective decision making in resource-limited settings or primary care, thereby shortening the time window for the treatment, and improving outcomes after IS.

Supplementary information: The online version contains supplementary material available at 10.1007/s13167-022-00283-4.

Keywords: Disease prediction; Improved individual outcomes; Ischemic stroke; Machine learning; Objective clinical data; Patients stratification; Predictive preventive and personalised medicine (PPPM/3PM); Targeted prevention.

PubMed Disclaimer

Conflict of interest statement

Competing interestsThe authors declare no competing interests.

Figures

**Fig. 1**
Schematic diagram overview of the study. The overview illustrates five primary processes: data acquisition, feature selection, model development, model validation, and model explanation. SAH-SFMU, Second Affiliated Hospital of Shandong First Medical University; LASSO, least absolute shrinkage and selection operator; RFECV, recursive feature elimination with fivefold cross-validation; DPH, Dongping People’s Hospital; PFI, permutation feature importance; LIME, local interpretable model-agnostic explanations; SHAP, SHapley Additive exPlanations

**Fig. 2**
Comparison of the ROC curve of six machine learning-based models. **A-C** Performances for training, internal validation, and external validation sets. AUC value is obtained via the corresponding ML-based model, 95% AUC confidence intervals are presented in the parentheses. Abbreviations: ROC, receiver operating characteristic; AUC, area under the receiver operating characteristic curve; XGBoost, extreme gradient boosting; RF, random forest; NN, neural network; LR, logistic regression; GuaissianNB, Gaussian naive Bayes; k-NN, k-nearest neighbours

**Fig. 3**
PFI of each machine learning-based model in the derivation dataset. Each histogram describes the PFI (also known as mean decrease accuracy) for a given ML-based model. The PFI is quantified by assigning the relative importance score for every independent input feature, indicating the relative importance of each feature when making a prediction. The top rankings are the most important features, while those bottom rankings matter least. Abbreviations: PFI, permutation feature importance; NeuP, neutrophil percentage; NeuC, neutrophil count; MonP, macrophage percentage; MCHC, mean corpuscular haemoglobin concentration; LymP, lymphocyte percentage; RDW-CV, red blood cell distribution width-CV; MCV, mean corpuscular volume; Hgb, haemoglobin; TC, total cholesterol; HDL-C, high-density lipoprotein cholesterol; UA, uric acid; TP, total protein; CG, calculated globulin; AKP, alkaline phosphatase

**Fig. 4**
Interpretation of real-time sample prediction by LIME and SHAP. Explanations are based on the XGBoost model trained on the derivation dataset. (a–d) True positive, true negative, false positive, and false negative observations, respectively. A Four individual prediction scenarios through the LIME algorithm, orange features push the IS risk higher whereas blue features push the IS risk lower. B The four individual prediction scenarios through the XGBoost Tree SHAP algorithm. “Base value” marks the mean of the model output (log odds ratio) over the IS dataset; f(x) is the output value for a given observation; red arrows push the prediction towards high IS risk whereas blue arrows push towards low IS risk; the size of arrow marks the magnitude for the corresponding feature’s effect. C SHAP summary plot. Each dot represents a person in this study, the position of the dot on the *x-axis* indicates the feature impact on the model’s prediction for a specific person. The features listed on the *y-axis* are ordered based on their importance. Abbreviations: IS, ischemic stroke; LIME, local interpretable model-agnostic explanations; SHAP, SHapley Additive exPlanations; NeuP, neutrophil percentage; NeuC, neutrophil count; MonP, macrophage percentage; MCHC, mean corpuscular haemoglobin concentration; LymP, lymphocyte percentage; RDW-CV, red blood cell distribution width-CV; MCV, mean corpuscular volume; Hgb, haemoglobin; TC, total cholesterol; HDL-C, high-density lipoprotein cholesterol; UA, uric acid; TP, total protein; CG, calculated globulin; AKP, alkaline phosphatase

**Fig. 5**
Website-Automatic System for the Triage of Ischemic Stroke. By inputting the example values of 15 clinical laboratory features and selecting the intended machine learning-based model, we can obtain a patient’s risk with ischemic stroke

See this image and copyright information in PMC

References

1. Wang W. Cardiovascular health in China: low level vs high diversity. The Lancet Regional Health–Western Pacific. 2020;3. 10.1016/j.lanwpc.2020.100038 - PMC - PubMed
1. Black M, Wang W, Wang W. Ischemic stroke: from next generation sequencing and GWAS to community genomics? OMICS J Integr Biol. 2015;19(8):451–460. doi: 10.1089/omi.2015.0083. - DOI - PubMed
1. Liu D, Zhao Z, Wang A, Ge S, Wang H, Zhang X, et al. Ischemic stroke is associated with the pro-inflammatory potential of N-glycosylated immunoglobulin G. J Neuroinflammation. 2018;15(1):123. doi: 10.1186/s12974-018-1161-1. - DOI - PMC - PubMed
1. Zhou M, Wang H, Zeng X, Yin P, Zhu J, Chen W, et al. Mortality, morbidity, and risk factors in China and its provinces, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. The Lancet. 2019;394(10204):1145–1158. doi: 10.1016/S0140-6736(19)30427-1. - DOI - PMC - PubMed
1. Wu S, Wu B, Liu M, Chen Z, Wang W, Anderson CS, et al. Stroke in China: advances and challenges in epidemiology, prevention, and management. Lancet Neurol. 2019;18(4):394–405. doi: 10.1016/S1474-4422(18)30500-3. - DOI - PubMed

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Rapid triage for ischemic stroke: a machine learning-driven approach in the context of predictive, preventive and personalised medicine

Affiliations

Rapid triage for ischemic stroke: a machine learning-driven approach in the context of predictive, preventive and personalised medicine

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Research Materials