Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2024 Nov-Dec:87:44-49.
doi: 10.1016/j.pcad.2024.10.010. Epub 2024 Oct 21.

ChatGPT-4 extraction of heart failure symptoms and signs from electronic health records

Affiliations
Comparative Study

ChatGPT-4 extraction of heart failure symptoms and signs from electronic health records

T Elizabeth Workman et al. Prog Cardiovasc Dis. 2024 Nov-Dec.

Abstract

Background: Natural language processing (NLP) can facilitate research utilizing data from electronic health records (EHRs). Large language models can potentially improve NLP applications leveraging EHR notes. The objective of this study was to assess the performance of zero-shot learning using Chat Generative Pre-trained Transformer 4 (ChatGPT-4) for extraction of symptoms and signs, and compare its performance to baseline machine learning and rule-based methods developed using annotated data.

Methods and results: From unstructured clinical notes of the national EHR data of the Veterans healthcare system, we extracted 1999 text snippets containing relevant keywords for heart failure symptoms and signs, which were then annotated by two clinicians. We also created 102 synthetic snippets that were semantically similar to snippets randomly selected from the original 1999 snippets. The authors applied zero-shot learning, using two different forms of prompt engineering in a symptom and sign extraction task with ChatGPT-4, utilizing the synthetic snippets. For comparison, baseline models using machine learning and rule-based methods were trained using the original 1999 annotated text snippets, and then used to classify the 102 synthetic snippets. The best zero-shot learning application achieved 90.6 % precision, 100 % recall, and 95 % F1 score, outperforming the best baseline method, which achieved 54.9 % precision, 82.4 % recall, and 65.5 % F1 score. Prompt style and temperature settings influenced zero-shot learning performance.

Conclusions: Zero-shot learning utilizing ChatGPT-4 significantly outperformed traditional machine learning and rule-based NLP. Prompt type and temperature settings affected zero-shot learning performance. These findings suggest a more efficient means of symptoms and signs extraction than traditional machine learning and rule-based methods.

Keywords: Heart failure; Information extraction; Large language models; Zero-shot learning.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest None.

Figures

Figure 1.
Figure 1.
Annotation classification distributions by symptom/sign group
Figures 2a and 2b.
Figures 2a and 2b.
Classification without and with rule-based (RB) enhancement. In Figure 2a (top), all input synthetic snippets are classified by the trained machine learning (ML) model. In Figure 2b, for synthetic snippets that are classified by both the trained ML model and the RB enhancement, the RB classification is used in final output. Note that only synthetic snippets containing matching regular expression patterns can be classified by the RB enhancement.
Figure 3.
Figure 3.
ZSL using the OpenAI API. Instructional prompt sent to OpenAI API, followed by the text to be analyzed; the API returns a “yes” or “no” answer indicating a positive or negative classification for the text.

References

    1. Friedman C, Alderson PO, Austin JH, Cimino JJ, Johnson SB. A general natural-language text processor for clinical radiology. J Am Med Inform Assoc 1994;1(2):161–74. - PMC - PubMed
    1. Sager N, Lyman M, Bucknall C, Nhan N, Tick LJ. Natural language processing and the representation of clinical data. J Am Med Inform Assoc 1994;1(2):142–60. - PMC - PubMed
    1. Sager NFC, Lyman M Medical language processing: computer management of narrative data. . Reading, MA: Addison-Wesley; 1987.
    1. Rindflesch TC, Fiszman M. The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J Biomed Inform 2003;36(6):462–77. - PubMed
    1. Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 2001;34(5):301–10. - PubMed

Publication types