Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 2:4:e69132.
doi: 10.2196/69132.

Identifying Asthma-Related Symptoms From Electronic Health Records Using a Hybrid Natural Language Processing Approach Within a Large Integrated Health Care System: Retrospective Study

Affiliations

Identifying Asthma-Related Symptoms From Electronic Health Records Using a Hybrid Natural Language Processing Approach Within a Large Integrated Health Care System: Retrospective Study

Fagen Xie et al. JMIR AI. .

Abstract

Background: Asthma-related symptoms are significant predictors of asthma exacerbation. Most of these symptoms are documented in clinical notes in a free-text format, and effective methods for capturing asthma-related symptoms from unstructured data are lacking.

Objective: The study aims to develop a natural language processing (NLP) algorithm for identifying symptoms associated with asthma from clinical notes within a large integrated health care system.

Methods: We analyzed unstructured clinical notes within 2 years before a visit with asthma diagnosis in 2013-2018 and 2021-2022 to identify 4 common asthma-related symptoms. Related terms and phrases were initially compiled from publicly available resources and then refined through clinician input and chart review. A rule-based NLP algorithm was iteratively developed and refined via multiple rounds of chart review followed by adjudication. Subsequently, transformer-based deep learning algorithms were trained using the same manually annotated datasets. A hybrid NLP algorithm was then generated by combining rule-based and transformer-based algorithms. The hybrid NLP algorithm was finally applied to the implementation notes.

Results: A total of 11,374,552 eligible clinical notes with 128,211,793 sentences were analyzed. After applying the hybrid algorithm to implementation notes, at least 1 asthma-related symptom was identified in 1,663,450 out of 127,763,086 (1.3%) sentences and 858,350 out of 11,364,952 (7.55%) notes, respectively. Cough was the most frequently identified at both the sentence (1,363,713/127,763,086, 1.07%) and note (660,685/11,364,952, 5.81%) levels, while chest tightness was the least frequent at both the sentence (141,733/127,763,086, 0.11%) and note (64,251/11,364,952, 0.57%) levels. The frequency of multiple symptoms ranged from 0.03% (36,057/127,763,086) to 0.38% (484,050/127,763,086) at the sentence level and 0.10% (10,954/11,364,952) to 1.85% (209,805/11,364,952) at the note level. Validation against 1600 manually annotated clinical notes yielded a positive predictive value ranging from 96.53% (wheezing) to 97.42% (chest tightness) at the sentence level and 96.76% (wheezing) to 97.42% (chest tightness) at the note level. Sensitivity ranged from 93.9% (dyspnea) to 95.95% (cough) at the sentence level and 96% (chest tightness) to 99.07% (cough) at the note level. All 4 symptoms had F1-scores greater than 0.95 at both the sentence and note levels, regardless of NLP algorithms.

Conclusions: The developed NLP algorithms could effectively capture asthma-related symptoms from unstructured clinical notes. These algorithms could be used to facilitate early asthma detection and predict exacerbation risk.

Keywords: asthma; electronic health record; natural language processing; rule-based algorithm; symptom extraction; transformer-based algorithm.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: RSZ has received grants from the National Heart, Lung, and Blood Institute, ALK-Abelló A/S, and Merck & Co. to Kaiser Permanente Southern California (KPSC), personal fees from the American Academy of Allergy, Asthma, and Immunology (AAAAI) as deputy editor of the Journal of Allergy and Clinical Immunology: In Practice, AstraZeneca, Merck & Co., and Bayer, royalties from UpToDate, and warrants from DBV Technologies. MS has received research support from Sanofi, stipend from the AAAAI as editor in chief of the Journal of Allergy and Clinical Immunology: In Practice, and royalties from UpToDate. All other authors have no relevant conflicts of interest.

Figures

Figure 1.
Figure 1.. Schematic diagram describing the process for identifying asthma-related symptoms from electronic health records. BERT: Bidirectional Encoder Representations from Transformers; EHR: electronic health record; NLP: natural language processing; PPV: positive predictive value.

Similar articles

Cited by

References

    1. Brusasco V, Crimi E, Pellegrino R. Airway hyperresponsiveness in asthma: not just a matter of airway inflammation. Thorax. 1998 Nov;53(11):992–998. doi: 10.1136/thx.53.11.992. doi. Medline. - DOI - PMC - PubMed
    1. GBD 2019 Diseases and Injuries Collaborators Global burden of 369 diseases and injuries in 204 countries and territories, 1990-2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet. 2020 Oct 17;396(10258):1204–1222. doi: 10.1016/S0140-6736(20)30925-9. doi. Medline. - DOI - PMC - PubMed
    1. Asthma prevalence in the United States, 2001–2021. Centers for Disease Control and Prevention. [21-11-2024]. https://www.cdc.gov/asthma/Asthma-Prevalence-US-2023-508.pdf URL. Accessed.
    1. Accordini S, Corsico AG, Braggion M, et al. The cost of persistent asthma in Europe: an international population-based study in adults. Int Arch Allergy Immunol. 2013;160(1):93–101. doi: 10.1159/000338998. doi. Medline. - DOI - PubMed
    1. Schatz M, Zeiger RS, Yang SJ, et al. Change in asthma control over time: predictors and outcomes. J Allergy Clin Immunol Pract. 2014;2(1):59–64. doi: 10.1016/j.jaip.2013.07.016. doi. Medline. - DOI - PubMed

LinkOut - more resources