Evaluation of GPT-4's Chest X-Ray Impression Generation: A Reader Study on Performance and Perception
- PMID: 38133918
- PMCID: PMC10770784
- DOI: 10.2196/50865
Evaluation of GPT-4's Chest X-Ray Impression Generation: A Reader Study on Performance and Perception
Abstract
Exploring the generative capabilities of the multimodal GPT-4, our study uncovered significant differences between radiological assessments and automatic evaluation metrics for chest x-ray impression generation and revealed radiological bias.
Keywords: AI; GPT; artificial intelligence; chest; diagnostic; generative; generative model; image; images; imaging; impression; impressions; medical imaging; multimodal; radiography; radiological; radiology; x-ray; x-rays.
©Sebastian Ziegelmayer, Alexander W Marka, Nicolas Lenhart, Nadja Nehls, Stefan Reischl, Felix Harder, Andreas Sauter, Marcus Makowski, Markus Graf, Joshua Gawlitza. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 22.12.2023.
Conflict of interest statement
Conflicts of Interest: None declared.
Figures

References
-
- Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A. Language models are few-shot learners. arXiv. Preprint posted online on May 28, 2020. https://arxiv.org/pdf/2005.14165.pdf
-
- Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G. Learning transferable visual models from natural language supervision. 38th International Conference on Machine Learning; July 18-24, 2021; Virtual. 2021.
-
- Sun Z, Ong H, Kennedy P, Tang L, Chen S, Elias J, Lucas E, Shih G, Peng Y. Evaluating GPT4 on impressions generation in radiology reports. Radiology. 2023 Jun;307(5):e231259. doi: 10.1148/radiol.231259. https://europepmc.org/abstract/MED/37367439 - DOI - PMC - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources