Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep 22:ocaf141.
doi: 10.1093/jamia/ocaf141. Online ahead of print.

Large language models accurately identify immunosuppression in intensive care unit patients

Collaborators, Affiliations

Large language models accurately identify immunosuppression in intensive care unit patients

Vijeeth Guggilla et al. J Am Med Inform Assoc. .

Abstract

Objective: Rule-based structured data algorithms and natural language processing (NLP) approaches applied to unstructured clinical notes have limited accuracy and poor generalizability for identifying immunosuppression. Large language models (LLMs) may effectively identify patients with heterogenous types of immunosuppression from unstructured clinical notes. We compared the performance of LLMs applied to unstructured notes for identifying patients with immunosuppressive conditions or immunosuppressive medication use against 2 baselines: (1) structured data algorithms using diagnosis codes and medication orders and (2) NLP approaches applied to unstructured notes.

Materials and methods: We used hospital admission notes from a primary cohort of 827 intensive care unit (ICU) patients at Northwestern Memorial Hospital and a validation cohort of 200 ICU patients at Beth Israel Deaconess Medical Center, along with diagnosis codes and medication orders from the primary cohort. We evaluated the performance of structured data algorithms, NLP approaches, and LLMs in identifying 7 immunosuppressive conditions and 6 immunosuppressive medications.

Results: In the primary cohort, structured data algorithms achieved peak F1 scores ranging from 0.30 to 0.97 for identifying immunosuppressive conditions and medications. NLP approaches achieved peak F1 scores ranging from 0 to 1. GPT-4o outperformed or matched structured data algorithms and NLP approaches across all conditions and medications, with F1 scores ranging from 0.51 to 1. GPT-4o also performed impressively in our validation cohort (F1 = 1 for 8/13 variables).

Discussion: LLMs, particularly GPT-4o, outperformed structured data algorithms and NLP approaches in identifying immunosuppressive conditions and medications with robust external validation.

Conclusion: LLMs can be applied for improved cohort identification for research purposes.

Keywords: clinical notes; diagnosis codes; immunosuppression; large language model.

PubMed Disclaimer

Conflict of interest statement

T.L.W. has received research funding from Gilead Sciences to support investigation of the relationship between immunosuppressive conditions and COVID-19 outcomes. Gilead personnel had no involvement in this research. All other authors declare no financial or non-financial competing interests.

Figures

Figure 1.
Figure 1.
Flow diagram of study design comparing the performance of structured data algorithms, NLP approaches, and LLMs in identifying immunosuppression followed by external validation of LLM performance. Structured data and admission notes were extracted for 827 SCRIPT admissions which were adjudicated for immunosuppressive conditions and medications. The performance of structured data algorithms, NLP approaches, and LLMs in predicting the adjudicated labels was assessed. External validation of using LLMs to predict adjudicated immunosuppression labels was performed in 200 MIMIC-III admission notes.
Figure 2.
Figure 2.
Performance of structured data algorithm baseline in identifying immunosuppressive conditions and medications in the SCRIPT cohort. F1 scores, F2 scores, precision, and recall were computed by comparing structured data algorithm predictions to adjudicated labels. (A) Performance metrics across an increasing minimum number of dates with a diagnosis code required to count as a case. (B) Confusion matrices for predicting medication use by the presence of a valid medication order in the 6 months prior to admission.
Figure 3.
Figure 3.
Prompts used for immunosuppression attribute extraction and estimation by LLMs from admission notes. Prompts for extracting (A) immunosuppressive conditions and (B) immunosuppressive medications where the highlighted portion introduces either a list of (A) conditions or (B) medications.
Figure 4.
Figure 4.
Performance of GPT-4o in identifying immunosuppressive conditions and medications in the SCRIPT cohort. F1 scores, F2 scores, precision, and recall were computed by comparing GPT-4o predictions to adjudicated labels. Confusion matrices for GPT-4o predicting (A) immunosuppressive conditions and (B) immunosuppressive medication use from admission notes.
Figure 5.
Figure 5.
Performance of GPT-4o in identifying immunosuppressive conditions and medications in the MIMIC-III cohort. F1 scores, F2 scores, precision, and recall were computed by comparing GPT-4o predictions to adjudicated labels. Confusion matrices for GPT-4o predicting (A) immunosuppressive conditions and (B) immunosuppressive medication use from admission notes.

References

    1. FastStats. 2024. Accessed September 10, 2024. https://www.cdc.gov/nchs/fastats/pneumonia.htm
    1. Greenberg JA, Hohmann SF, Hall JB, Kress JP, David MZ. Validation of a method to identify immunocompromised patients with severe sepsis in administrative databases. Ann Am Thorac Soc. 2016;13:253–258. - PMC - PubMed
    1. Pathak J, Kho AN, Denny JC. Electronic health records-driven phenotyping: challenges, recent advances, and perspectives. J Am Med Inform Assoc. 2013;20:e206–e211. - PMC - PubMed
    1. Gupta S, Belouali A, Shah NJ, Atkins MB, Madhavan S. Automated identification of patients with immune-related adverse events from clinical notes using word embedding and machine learning. JCO Clin Cancer Inform 2021;5:541–549. - PMC - PubMed
    1. Hao T, Huang Z, Liang L, Weng H, Tang B. Health natural language processing: methodology development and applications. JMIR Med. Inform 2021;9:e23898. - PMC - PubMed

Grants and funding