Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 1;31(10):2284-2293.
doi: 10.1093/jamia/ocae132.

Can GPT-3.5 generate and code discharge summaries?

Affiliations

Can GPT-3.5 generate and code discharge summaries?

Matúš Falis et al. J Am Med Inform Assoc. .

Abstract

Objectives: The aim of this study was to investigate GPT-3.5 in generating and coding medical documents with International Classification of Diseases (ICD)-10 codes for data augmentation on low-resource labels.

Materials and methods: Employing GPT-3.5 we generated and coded 9606 discharge summaries based on lists of ICD-10 code descriptions of patients with infrequent (or generation) codes within the MIMIC-IV dataset. Combined with the baseline training set, this formed an augmented training set. Neural coding models were trained on baseline and augmented data and evaluated on an MIMIC-IV test set. We report micro- and macro-F1 scores on the full codeset, generation codes, and their families. Weak Hierarchical Confusion Matrices determined within-family and outside-of-family coding errors in the latter codesets. The coding performance of GPT-3.5 was evaluated on prompt-guided self-generated data and real MIMIC-IV data. Clinicians evaluated the clinical acceptability of the generated documents.

Results: Data augmentation results in slightly lower overall model performance but improves performance for the generation candidate codes and their families, including 1 absent from the baseline training data. Augmented models display lower out-of-family error rates. GPT-3.5 identifies ICD-10 codes by their prompted descriptions but underperforms on real data. Evaluators highlight the correctness of generated concepts while suffering in variety, supporting information, and narrative.

Discussion and conclusion: While GPT-3.5 alone given our prompt setting is unsuitable for ICD-10 coding, it supports data augmentation for training neural models. Augmentation positively affects generation code families but mainly benefits codes with existing examples. Augmentation reduces out-of-family errors. Documents generated by GPT-3.5 state prompted concepts correctly but lack variety, and authenticity in narratives.

Keywords: ICD coding; clinical text generation; data augmentation; evaluation by clinicians; large language model.

PubMed Disclaimer

Conflict of interest statement

We have identified no competing interests.

Figures

Figure 1.
Figure 1.
An example generation of a synthetic discharge via GPT-3.5.
Figure 2.
Figure 2.
A comparison between the discharge summary data in MIMIC-IV, seed MIMIC discharge summaries for generation (the source data), and the generated discharge summaries. Subfigures 2A and 2B focus on the number of words in documents, indicating that the GPT-generated data generally contains fewer words overall and per assigned label compared to the real data from MIMIC-IV. Subfigure 2C demonstrates that, although there’s a variance in document size, the distribution of the number of labels per document remains relatively similar across the datasets.
Figure 3.
Figure 3.
The workflow of the GPT-3.5 prediction. We used Azure AI Services API to query GPT-3.5 and we employed a postprocessing step to extract the predicted diagnoses and ICD-10 codes for each clinical note.
Figure 4.
Figure 4.
An example evaluation of a synthetic discharge summary by a clinical expert.

References

    1. Dong H, Falis M, Whiteley W, et al. Automated clinical coding: what, why, and where we are? NPJ Digit Med. 2022;5(1):159. - PMC - PubMed
    1. Johnson AE, Bulgarelli L, Shen L, et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data. 2023;10(1):31. - PMC - PubMed
    1. Mullenbach J, Wiegreffe S, Duke J, et al. Explainable prediction of medical codes from clinical text. In: Proceedings of NAACL-HLT; 2018:1101-1111.
    1. Dong H, , Suárez-PaniaguaV, , WhiteleyExplainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation. J Biomed Inform. 2021;116:103728. - PubMed
    1. Kim BH, Ganapathi V. Read, attend, and code: pushing the limits of medical codes prediction from clinical notes by machines. In: Machine Learning for Healthcare Conference. PMLR; 2021:196-208.

Publication types