Identifying Asthma-Related Symptoms From Electronic Health Records Using a Hybrid Natural Language Processing Approach Within a Large Integrated Health Care System: Retrospective Study
- PMID: 40611521
- PMCID: PMC12231518
- DOI: 10.2196/69132
Identifying Asthma-Related Symptoms From Electronic Health Records Using a Hybrid Natural Language Processing Approach Within a Large Integrated Health Care System: Retrospective Study
Abstract
Background: Asthma-related symptoms are significant predictors of asthma exacerbation. Most of these symptoms are documented in clinical notes in a free-text format, and effective methods for capturing asthma-related symptoms from unstructured data are lacking.
Objective: The study aims to develop a natural language processing (NLP) algorithm for identifying symptoms associated with asthma from clinical notes within a large integrated health care system.
Methods: We analyzed unstructured clinical notes within 2 years before a visit with asthma diagnosis in 2013-2018 and 2021-2022 to identify 4 common asthma-related symptoms. Related terms and phrases were initially compiled from publicly available resources and then refined through clinician input and chart review. A rule-based NLP algorithm was iteratively developed and refined via multiple rounds of chart review followed by adjudication. Subsequently, transformer-based deep learning algorithms were trained using the same manually annotated datasets. A hybrid NLP algorithm was then generated by combining rule-based and transformer-based algorithms. The hybrid NLP algorithm was finally applied to the implementation notes.
Results: A total of 11,374,552 eligible clinical notes with 128,211,793 sentences were analyzed. After applying the hybrid algorithm to implementation notes, at least 1 asthma-related symptom was identified in 1,663,450 out of 127,763,086 (1.3%) sentences and 858,350 out of 11,364,952 (7.55%) notes, respectively. Cough was the most frequently identified at both the sentence (1,363,713/127,763,086, 1.07%) and note (660,685/11,364,952, 5.81%) levels, while chest tightness was the least frequent at both the sentence (141,733/127,763,086, 0.11%) and note (64,251/11,364,952, 0.57%) levels. The frequency of multiple symptoms ranged from 0.03% (36,057/127,763,086) to 0.38% (484,050/127,763,086) at the sentence level and 0.10% (10,954/11,364,952) to 1.85% (209,805/11,364,952) at the note level. Validation against 1600 manually annotated clinical notes yielded a positive predictive value ranging from 96.53% (wheezing) to 97.42% (chest tightness) at the sentence level and 96.76% (wheezing) to 97.42% (chest tightness) at the note level. Sensitivity ranged from 93.9% (dyspnea) to 95.95% (cough) at the sentence level and 96% (chest tightness) to 99.07% (cough) at the note level. All 4 symptoms had F1-scores greater than 0.95 at both the sentence and note levels, regardless of NLP algorithms.
Conclusions: The developed NLP algorithms could effectively capture asthma-related symptoms from unstructured clinical notes. These algorithms could be used to facilitate early asthma detection and predict exacerbation risk.
Keywords: asthma; electronic health record; natural language processing; rule-based algorithm; symptom extraction; transformer-based algorithm.
© Fagen Xie, Robert S Zeiger, Mary Marycania Saparudin, Sahar Al-Salman, Eric Puttock, William Crawford, Michael Schatz, Stanley Xu, William M Vollmer, Wansu Chen. Originally published in JMIR AI (https://ai.jmir.org).
Conflict of interest statement
Figures
Similar articles
-
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3. Cochrane Database Syst Rev. 2022. PMID: 35593186 Free PMC article.
-
Antibiotics for exacerbations of asthma.Cochrane Database Syst Rev. 2018 Jun 25;6(6):CD002741. doi: 10.1002/14651858.CD002741.pub2. Cochrane Database Syst Rev. 2018. PMID: 29938789 Free PMC article.
-
Extraction of sleep information from clinical notes of Alzheimer's disease patients using natural language processing.J Am Med Inform Assoc. 2024 Oct 1;31(10):2217-2227. doi: 10.1093/jamia/ocae177. J Am Med Inform Assoc. 2024. PMID: 39001795 Free PMC article.
-
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23. Clin Orthop Relat Res. 2024. PMID: 39051924
-
Development and Validation of a Rule-Based Natural Language Processing Algorithm to Identify Falls in Inpatient Records of Older Adults: Retrospective Analysis.JMIR Aging. 2025 Jul 8;8:e65195. doi: 10.2196/65195. JMIR Aging. 2025. PMID: 40627677 Free PMC article.
Cited by
-
Symptoms of Asthma Extracted Through Natural Language Processing and Their Associations With Acute Asthma Exacerbation in Adults With Mild Asthma.J Allergy Clin Immunol Pract. 2025 Jul;13(7):1719-1729.e7. doi: 10.1016/j.jaip.2025.04.031. Epub 2025 Apr 26. J Allergy Clin Immunol Pract. 2025. PMID: 40294848
References
-
- GBD 2019 Diseases and Injuries Collaborators Global burden of 369 diseases and injuries in 204 countries and territories, 1990-2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet. 2020 Oct 17;396(10258):1204–1222. doi: 10.1016/S0140-6736(20)30925-9. doi. Medline. - DOI - PMC - PubMed
-
- Asthma prevalence in the United States, 2001–2021. Centers for Disease Control and Prevention. [21-11-2024]. https://www.cdc.gov/asthma/Asthma-Prevalence-US-2023-508.pdf URL. Accessed.
Grants and funding
LinkOut - more resources
Full Text Sources