Can GPT-3.5 generate and code discharge summaries?
- PMID: 39271171
- PMCID: PMC11413433
- DOI: 10.1093/jamia/ocae132
Can GPT-3.5 generate and code discharge summaries?
Abstract
Objectives: The aim of this study was to investigate GPT-3.5 in generating and coding medical documents with International Classification of Diseases (ICD)-10 codes for data augmentation on low-resource labels.
Materials and methods: Employing GPT-3.5 we generated and coded 9606 discharge summaries based on lists of ICD-10 code descriptions of patients with infrequent (or generation) codes within the MIMIC-IV dataset. Combined with the baseline training set, this formed an augmented training set. Neural coding models were trained on baseline and augmented data and evaluated on an MIMIC-IV test set. We report micro- and macro-F1 scores on the full codeset, generation codes, and their families. Weak Hierarchical Confusion Matrices determined within-family and outside-of-family coding errors in the latter codesets. The coding performance of GPT-3.5 was evaluated on prompt-guided self-generated data and real MIMIC-IV data. Clinicians evaluated the clinical acceptability of the generated documents.
Results: Data augmentation results in slightly lower overall model performance but improves performance for the generation candidate codes and their families, including 1 absent from the baseline training data. Augmented models display lower out-of-family error rates. GPT-3.5 identifies ICD-10 codes by their prompted descriptions but underperforms on real data. Evaluators highlight the correctness of generated concepts while suffering in variety, supporting information, and narrative.
Discussion and conclusion: While GPT-3.5 alone given our prompt setting is unsuitable for ICD-10 coding, it supports data augmentation for training neural models. Augmentation positively affects generation code families but mainly benefits codes with existing examples. Augmentation reduces out-of-family errors. Documents generated by GPT-3.5 state prompted concepts correctly but lack variety, and authenticity in narratives.
Keywords: ICD coding; clinical text generation; data augmentation; evaluation by clinicians; large language model.
© The Author(s) 2024. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Conflict of interest statement
We have identified no competing interests.
Figures




References
-
- Mullenbach J, Wiegreffe S, Duke J, et al. Explainable prediction of medical codes from clinical text. In: Proceedings of NAACL-HLT; 2018:1101-1111.
-
- Dong H, , Suárez-PaniaguaV, , WhiteleyExplainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation. J Biomed Inform. 2021;116:103728. - PubMed
-
- Kim BH, Ganapathi V. Read, attend, and code: pushing the limits of medical codes prediction from clinical notes by machines. In: Machine Learning for Healthcare Conference. PMLR; 2021:196-208.
Publication types
MeSH terms
Grants and funding
- 223499/Z/21/Z/WT_/Wellcome Trust/United Kingdom
- UKRI
- National Institute for Health Research
- NIHR202639/Artificial Intelligence and Multimorbidity: Clustering in Individuals, Space and Clinical Context
- EP/S02431X/1/United Kingdom Research and Innovation
- Multimorbidity Doctoral Training Programme for Health Professionals
- WT_/Wellcome Trust/United Kingdom
- Legal and General PLC
- Advanced Care Research Centre
- UKRI Centre for Doctoral Training in Biomedical AI at the University of Edinburgh, School of Informatics
- EP/V050869/1/Engineering and Physical Sciences Research Council
LinkOut - more resources
Full Text Sources