. 2024 Oct 1;31(10):2284-2293.

doi: 10.1093/jamia/ocae132.

Can GPT-3.5 generate and code discharge summaries?

Matúš Falis¹, Aryo Pradipta Gema¹, Hang Dong², Luke Daines³, Siddharth Basetti⁴, Michael Holder⁵, Rose S Penfold^{6

7}, Alexandra Birch¹, Beatrice Alex^{8

9}

Affiliations

¹ School of Informatics, The University of Edinburgh, Edinburgh EH8 9AB, United Kingdom.
² Department of Computer Science, University of Exeter, Exeter EX4 4QF, United Kingdom.
³ Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh EH16 4UX, United Kingdom.
⁴ Department of Research, Development and Innovation, National Health Service Highland, Inverness IV2 3JH, United Kingdom.
⁵ Centre for Population Health Sciences, Usher Institute, The University of Edinburgh, Edinburgh EH16 4UX, United Kingdom.
⁶ Ageing and Health, Usher Institute, The University of Edinburgh, Edinburgh EH16 4UX, United Kingdom.
⁷ Advanced Care Research Centre, The University of Edinburgh, Edinburgh EH16 4UX, United Kingdom.
⁸ Edinburgh Futures Institute, The University of Edinburgh, Edinburgh EH3 9EF, United Kingdom.
⁹ School of Literatures, Languages and Cultures, The University of Edinburgh, Edinburgh EH8 9LH, United Kingdom.

PMID: 39271171
PMCID: PMC11413433
DOI: 10.1093/jamia/ocae132

Can GPT-3.5 generate and code discharge summaries?

Matúš Falis et al. J Am Med Inform Assoc. 2024.

. 2024 Oct 1;31(10):2284-2293.

doi: 10.1093/jamia/ocae132.

Authors

Matúš Falis¹, Aryo Pradipta Gema¹, Hang Dong², Luke Daines³, Siddharth Basetti⁴, Michael Holder⁵, Rose S Penfold^{6

7}, Alexandra Birch¹, Beatrice Alex^{8

9}

Affiliations

¹ School of Informatics, The University of Edinburgh, Edinburgh EH8 9AB, United Kingdom.
² Department of Computer Science, University of Exeter, Exeter EX4 4QF, United Kingdom.
³ Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh EH16 4UX, United Kingdom.
⁴ Department of Research, Development and Innovation, National Health Service Highland, Inverness IV2 3JH, United Kingdom.
⁵ Centre for Population Health Sciences, Usher Institute, The University of Edinburgh, Edinburgh EH16 4UX, United Kingdom.
⁶ Ageing and Health, Usher Institute, The University of Edinburgh, Edinburgh EH16 4UX, United Kingdom.
⁷ Advanced Care Research Centre, The University of Edinburgh, Edinburgh EH16 4UX, United Kingdom.
⁸ Edinburgh Futures Institute, The University of Edinburgh, Edinburgh EH3 9EF, United Kingdom.
⁹ School of Literatures, Languages and Cultures, The University of Edinburgh, Edinburgh EH8 9LH, United Kingdom.

PMID: 39271171
PMCID: PMC11413433
DOI: 10.1093/jamia/ocae132

Abstract

Objectives: The aim of this study was to investigate GPT-3.5 in generating and coding medical documents with International Classification of Diseases (ICD)-10 codes for data augmentation on low-resource labels.

Materials and methods: Employing GPT-3.5 we generated and coded 9606 discharge summaries based on lists of ICD-10 code descriptions of patients with infrequent (or generation) codes within the MIMIC-IV dataset. Combined with the baseline training set, this formed an augmented training set. Neural coding models were trained on baseline and augmented data and evaluated on an MIMIC-IV test set. We report micro- and macro-F1 scores on the full codeset, generation codes, and their families. Weak Hierarchical Confusion Matrices determined within-family and outside-of-family coding errors in the latter codesets. The coding performance of GPT-3.5 was evaluated on prompt-guided self-generated data and real MIMIC-IV data. Clinicians evaluated the clinical acceptability of the generated documents.

Results: Data augmentation results in slightly lower overall model performance but improves performance for the generation candidate codes and their families, including 1 absent from the baseline training data. Augmented models display lower out-of-family error rates. GPT-3.5 identifies ICD-10 codes by their prompted descriptions but underperforms on real data. Evaluators highlight the correctness of generated concepts while suffering in variety, supporting information, and narrative.

Discussion and conclusion: While GPT-3.5 alone given our prompt setting is unsuitable for ICD-10 coding, it supports data augmentation for training neural models. Augmentation positively affects generation code families but mainly benefits codes with existing examples. Augmentation reduces out-of-family errors. Documents generated by GPT-3.5 state prompted concepts correctly but lack variety, and authenticity in narratives.

Keywords: ICD coding; clinical text generation; data augmentation; evaluation by clinicians; large language model.

PubMed Disclaimer

Conflict of interest statement

We have identified no competing interests.

Figures

**Figure 1.**
An example generation of a synthetic discharge via GPT-3.5.

**Figure 2.**
A comparison between the discharge summary data in MIMIC-IV, seed MIMIC discharge summaries for generation (the source data), and the generated discharge summaries. Subfigures 2A and 2B focus on the number of words in documents, indicating that the GPT-generated data generally contains fewer words overall and per assigned label compared to the real data from MIMIC-IV. Subfigure 2C demonstrates that, although there’s a variance in document size, the distribution of the number of labels per document remains relatively similar across the datasets.

**Figure 3.**
The workflow of the GPT-3.5 prediction. We used Azure AI Services API to query GPT-3.5 and we employed a postprocessing step to extract the predicted diagnoses and ICD-10 codes for each clinical note.

**Figure 4.**
An example evaluation of a synthetic discharge summary by a clinical expert.

See this image and copyright information in PMC

References

1. Dong H, Falis M, Whiteley W, et al. Automated clinical coding: what, why, and where we are? NPJ Digit Med. 2022;5(1):159. - PMC - PubMed
1. Johnson AE, Bulgarelli L, Shen L, et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data. 2023;10(1):31. - PMC - PubMed
1. Mullenbach J, Wiegreffe S, Duke J, et al. Explainable prediction of medical codes from clinical text. In: Proceedings of NAACL-HLT; 2018:1101-1111.
1. Dong H, , Suárez-PaniaguaV, , WhiteleyExplainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation. J Biomed Inform. 2021;116:103728. - PubMed
1. Kim BH, Ganapathi V. Read, attend, and code: pushing the limits of medical codes prediction from clinical notes by machines. In: Machine Learning for Healthcare Conference. PMLR; 2021:196-208.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Can GPT-3.5 generate and code discharge summaries?

Affiliations

Can GPT-3.5 generate and code discharge summaries?

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources