. 2022 Jul 29:9:941237.

doi: 10.3389/fcvm.2022.941237. eCollection 2022.

Automated risk assessment of newly detected atrial fibrillation poststroke from electronic health record data using machine learning and natural language processing

Sheng-Feng Sung^{1

2}, Kuan-Lin Sung³, Ru-Chiou Pan⁴, Pei-Ju Lee⁵, Ya-Han Hu⁶

Affiliations

¹ Division of Neurology, Department of Internal Medicine, Ditmanson Medical Foundation Chiayi Christian Hospital, Chiayi City, Taiwan.
² Department of Nursing, Min-Hwei Junior College of Health Care Management, Tainan, Taiwan.
³ School of Medicine, National Taiwan University, Taipei, Taiwan.
⁴ Clinical Data Center, Department of Medical Research, Ditmanson Medical Foundation Chiayi Christian Hospital, Chiayi City, Taiwan.
⁵ Department of Information Management and Institute of Healthcare Information Management, National Chung Cheng University, Chiayi County, Taiwan.
⁶ Department of Information Management, National Central University, Taoyuan, Taiwan.

PMID: 35966534
PMCID: PMC9372298
DOI: 10.3389/fcvm.2022.941237

Automated risk assessment of newly detected atrial fibrillation poststroke from electronic health record data using machine learning and natural language processing

Sheng-Feng Sung et al. Front Cardiovasc Med. 2022.

. 2022 Jul 29:9:941237.

doi: 10.3389/fcvm.2022.941237. eCollection 2022.

Authors

Sheng-Feng Sung^{1

2}, Kuan-Lin Sung³, Ru-Chiou Pan⁴, Pei-Ju Lee⁵, Ya-Han Hu⁶

Affiliations

¹ Division of Neurology, Department of Internal Medicine, Ditmanson Medical Foundation Chiayi Christian Hospital, Chiayi City, Taiwan.
² Department of Nursing, Min-Hwei Junior College of Health Care Management, Tainan, Taiwan.
³ School of Medicine, National Taiwan University, Taipei, Taiwan.
⁴ Clinical Data Center, Department of Medical Research, Ditmanson Medical Foundation Chiayi Christian Hospital, Chiayi City, Taiwan.
⁵ Department of Information Management and Institute of Healthcare Information Management, National Chung Cheng University, Chiayi County, Taiwan.
⁶ Department of Information Management, National Central University, Taoyuan, Taiwan.

PMID: 35966534
PMCID: PMC9372298
DOI: 10.3389/fcvm.2022.941237

Abstract

Background: Timely detection of atrial fibrillation (AF) after stroke is highly clinically relevant, aiding decisions on the optimal strategies for secondary prevention of stroke. In the context of limited medical resources, it is crucial to set the right priorities of extended heart rhythm monitoring by stratifying patients into different risk groups likely to have newly detected AF (NDAF). This study aimed to develop an electronic health record (EHR)-based machine learning model to assess the risk of NDAF in an early stage after stroke.

Methods: Linked data between a hospital stroke registry and a deidentified research-based database including EHRs and administrative claims data was used. Demographic features, physiological measurements, routine laboratory results, and clinical free text were extracted from EHRs. The extreme gradient boosting algorithm was used to build the prediction model. The prediction performance was evaluated by the C-index and was compared to that of the AS5F and CHASE-LESS scores.

Results: The study population consisted of a training set of 4,064 and a temporal test set of 1,492 patients. During a median follow-up of 10.2 months, the incidence rate of NDAF was 87.0 per 1,000 person-year in the test set. On the test set, the model based on both structured and unstructured data achieved a C-index of 0.840, which was significantly higher than those of the AS5F (0.779, p = 0.023) and CHASE-LESS (0.768, p = 0.005) scores.

Conclusions: It is feasible to build a machine learning model to assess the risk of NDAF based on EHR data available at the time of hospital admission. Inclusion of information derived from clinical free text can significantly improve the model performance and may outperform risk scores developed using traditional statistical methods. Further studies are needed to assess the clinical usefulness of the prediction model.

Keywords: atrial fibrillation; electronic health records; ischemic stroke; natural language processing; prediction.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Figure 1**
Definition of AF categories according to the time sequence between AF detection and the index stroke. AF, atrial fibrillation.

**Figure 2**
The process of machine learning model construction. BOW, bag-of-words; BR, binary representation; CV, cross validation; TF, term frequency; TF-IDF, term frequency with inverse document frequency.

**Figure 3**
Heat map showing AUC values across machine learning models with different combinations of text vectorization techniques and resampling methods. AUC, area under the receiver operating characteristic curve; BR, binary representation; TF, term frequency; TF-IDF, term frequency with inverse document frequency.

**Figure 4**
The top 20 most important features identified by the model based on both structured data and unstructured textual data. The mean absolute Shapley values that indicate the average impact on model output are shown in a bar chart **(A)**. The individual Shapley values for these features for each patient are depicted in a beeswarm plot **(B)**, where a dot's position on the x-axis denotes each feature's contribution to the model prediction for that patient. The color of the dot specifies the relative value of the corresponding feature.

See this image and copyright information in PMC

Cited by

Clinical applications of large language models in medicine and surgery: A scoping review.
Liang EN, Pei S, Staibano P, van der Woerd B. Liang EN, et al. J Int Med Res. 2025 Jul;53(7):3000605251347556. doi: 10.1177/03000605251347556. Epub 2025 Jul 4. J Int Med Res. 2025. PMID: 40615349 Free PMC article.
Evaluating Machine Learning Models for Stroke Prognosis and Prediction in Atrial Fibrillation Patients: A Comprehensive Meta-Analysis.
Goh B, Bhaskar SMM. Goh B, et al. Diagnostics (Basel). 2024 Oct 26;14(21):2391. doi: 10.3390/diagnostics14212391. Diagnostics (Basel). 2024. PMID: 39518359 Free PMC article.
Clinical applications of artificial intelligence and machine learning in neurocardiology: a comprehensive review.
Basem J, Mani R, Sun S, Gilotra K, Dianati-Maleki N, Dashti R. Basem J, et al. Front Cardiovasc Med. 2025 Apr 3;12:1525966. doi: 10.3389/fcvm.2025.1525966. eCollection 2025. Front Cardiovasc Med. 2025. PMID: 40248254 Free PMC article. Review.

References

1. Wang Y, Xu J, Zhao X, Wang D, Wang C, Liu L, et al. . Association of hypertension with stroke recurrence depends on ischemic stroke subtype. Stroke. (2013) 44:1232–7. 10.1161/strokeaha.111.000302 - DOI - PubMed
1. Kang K, Park TH, Kim N, Jang MU, Park S-S, Park J-M, et al. . Recurrent stroke, myocardial infarction, and major vascular events during the first year after acute ischemic stroke: the multicenter prospective observational study about recurrence and its determinants after acute ischemic stroke I. J Stroke Cerebrovasc Dis. (2016) 25:656–64. 10.1016/j.jstrokecerebrovasdis.2015.11.036 - DOI - PubMed
1. Hsieh C-Y, Wu DP, Sung S-F. Trends in vascular risk factors, stroke performance measures, and outcomes in patients with first-ever ischemic stroke in Taiwan between 2000 and 2012. J Neurol Sci. (2017) 378:80–4. 10.1016/j.jns.2017.05.002 - DOI - PubMed
1. Lin B, Zhang Z, Mei Y, Wang C, Xu H, Liu L, et al. . Cumulative risk of stroke recurrence over the last 10 years: a systematic review and meta-analysis. Neurol Sci. (2021) 42:61–71. 10.1007/s10072-020-04797-5 - DOI - PubMed
1. Rücker V, Heuschmann PU, O'Flaherty M, Weingärtner M, Hess M, Sedlak C, et al. . Twenty-year time trends in long-term case-fatality and recurrence rates after ischemic stroke stratified by etiology. Stroke. (2020) 51:2778–85. 10.1161/strokeaha.120.029972 - DOI - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Automated risk assessment of newly detected atrial fibrillation poststroke from electronic health record data using machine learning and natural language processing

Affiliations

Automated risk assessment of newly detected atrial fibrillation poststroke from electronic health record data using machine learning and natural language processing

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

LinkOut - more resources

Full Text Sources