Evaluation of GPT-4's Chest X-Ray Impression Generation: A Reader Study on Performance and Perception

Affiliations

Affiliation

¹ Department of Diagnostic and Interventional Radiology, School of Medicine & Klinikum rechts der Isar, Technical University of Munich, Munich, Germany.

^# Contributed equally.

PMID: 38133918
PMCID: PMC10770784
DOI: 10.2196/50865

Evaluation of GPT-4's Chest X-Ray Impression Generation: A Reader Study on Performance and Perception

Sebastian Ziegelmayer et al. J Med Internet Res. 2023.

. 2023 Dec 22:25:e50865.

doi: 10.2196/50865.

Affiliation

¹ Department of Diagnostic and Interventional Radiology, School of Medicine & Klinikum rechts der Isar, Technical University of Munich, Munich, Germany.

^# Contributed equally.

PMID: 38133918
PMCID: PMC10770784
DOI: 10.2196/50865

Abstract

Exploring the generative capabilities of the multimodal GPT-4, our study uncovered significant differences between radiological assessments and automatic evaluation metrics for chest x-ray impression generation and revealed radiological bias.

Keywords: AI; GPT; artificial intelligence; chest; diagnostic; generative; generative model; image; images; imaging; impression; impressions; medical imaging; multimodal; radiography; radiological; radiology; x-ray; x-rays.

©Sebastian Ziegelmayer, Alexander W Marka, Nicolas Lenhart, Nadja Nehls, Stefan Reischl, Felix Harder, Andreas Sauter, Marcus Makowski, Markus Graf, Joshua Gawlitza. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 22.12.2023.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

**Figure 1**
Scatterplots for each automated metric (BERT=blue; BLEU=yellow; CheXbert vector similarity=gray; RadGraph=light blue; RadCliQ=red) depending on the input: (A) image, (B) text, or (C) text and image. For the image input, all metrics except CheXbert vector similarity showed a significant correlation. However, the correlation was divergent or opposing for the text and text and image inputs. All correlation coefficients with their P values are shown in the lower section of the figure. BERT: Bidirectional Encoder Representations From Transformers; BLEU: bilingual evaluation understudy.

See this image and copyright information in PMC

References

1. Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A. Language models are few-shot learners. arXiv. Preprint posted online on May 28, 2020. https://arxiv.org/pdf/2005.14165.pdf
1. Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G. Learning transferable visual models from natural language supervision. 38th International Conference on Machine Learning; July 18-24, 2021; Virtual. 2021.
1. Bhayana R, Bleakney R, Krishna S. GPT-4 in radiology: improvements in advanced reasoning. Radiology. 2023 Jun;307(5):e230987. doi: 10.1148/radiol.230987. - DOI - PubMed
1. Adams L, Truhn D, Busch F, Kader A, Niehues SM, Makowski MR, Bressem KK. Leveraging GPT-4 for post hoc transformation of free-text radiology reports into structured reporting: a multilingual feasibility study. Radiology. 2023 May;307(4):e230725. doi: 10.1148/radiol.230725. - DOI - PubMed
1. Sun Z, Ong H, Kennedy P, Tang L, Chen S, Elias J, Lucas E, Shih G, Peng Y. Evaluating GPT4 on impressions generation in radiology reports. Radiology. 2023 Jun;307(5):e231259. doi: 10.1148/radiol.231259. https://europepmc.org/abstract/MED/37367439 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Evaluation of GPT-4's Chest X-Ray Impression Generation: A Reader Study on Performance and Perception

Affiliation

Evaluation of GPT-4's Chest X-Ray Impression Generation: A Reader Study on Performance and Perception

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources