Preliminary assessment of automated radiology report generation with generative pre-trained transformers: comparing results to radiologist-generated reports
- PMID: 37713022
- PMCID: PMC10811038
- DOI: 10.1007/s11604-023-01487-y
Preliminary assessment of automated radiology report generation with generative pre-trained transformers: comparing results to radiologist-generated reports
Abstract
Purpose: In this preliminary study, we aimed to evaluate the potential of the generative pre-trained transformer (GPT) series for generating radiology reports from concise imaging findings and compare its performance with radiologist-generated reports.
Methods: This retrospective study involved 28 patients who underwent computed tomography (CT) scans and had a diagnosed disease with typical imaging findings. Radiology reports were generated using GPT-2, GPT-3.5, and GPT-4 based on the patient's age, gender, disease site, and imaging findings. We calculated the top-1, top-5 accuracy, and mean average precision (MAP) of differential diagnoses for GPT-2, GPT-3.5, GPT-4, and radiologists. Two board-certified radiologists evaluated the grammar and readability, image findings, impression, differential diagnosis, and overall quality of all reports using a 4-point scale.
Results: Top-1 and Top-5 accuracies for the different diagnoses were highest for radiologists, followed by GPT-4, GPT-3.5, and GPT-2, in that order (Top-1: 1.00, 0.54, 0.54, and 0.21, respectively; Top-5: 1.00, 0.96, 0.89, and 0.54, respectively). There were no significant differences in qualitative scores about grammar and readability, image findings, and overall quality between radiologists and GPT-3.5 or GPT-4 (p > 0.05). However, qualitative scores of the GPT series in impression and differential diagnosis scores were significantly lower than those of radiologists (p < 0.05).
Conclusions: Our preliminary study suggests that GPT-3.5 and GPT-4 have the possibility to generate radiology reports with high readability and reasonable image findings from very short keywords; however, concerns persist regarding the accuracy of impressions and differential diagnoses, thereby requiring verification by radiologists.
Keywords: Computed tomography; Deep learning; Generative pre-trained transformer; Large language model; Radiology report.
© 2023. The Author(s).
Conflict of interest statement
Toshinori Hirai has received research support from Canon Medical Systems.
Figures






Similar articles
-
Comparing the Diagnostic Performance of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and Radiologists in Challenging Neuroradiology Cases.Clin Neuroradiol. 2024 Dec;34(4):779-787. doi: 10.1007/s00062-024-01426-y. Epub 2024 May 28. Clin Neuroradiol. 2024. PMID: 38806794
-
ChatGPT's diagnostic performance based on textual vs. visual information compared to radiologists' diagnostic performance in musculoskeletal radiology.Eur Radiol. 2025 Jan;35(1):506-516. doi: 10.1007/s00330-024-10902-5. Epub 2024 Jul 12. Eur Radiol. 2025. PMID: 38995378 Free PMC article.
-
Evaluating the performance of Generative Pre-trained Transformer-4 (GPT-4) in standardizing radiology reports.Eur Radiol. 2024 Jun;34(6):3566-3574. doi: 10.1007/s00330-023-10384-x. Epub 2023 Nov 8. Eur Radiol. 2024. PMID: 37938381
-
Advancing radiology with GPT-4: Innovations in clinical applications, patient engagement, research, and learning.Eur J Radiol Open. 2024 Jul 26;13:100589. doi: 10.1016/j.ejro.2024.100589. eCollection 2024 Dec. Eur J Radiol Open. 2024. PMID: 39170856 Free PMC article. Review.
-
Capacity for paediatric radiology in Nigeria: a survey of radiologists.Pediatr Radiol. 2021 Apr;51(4):587-591. doi: 10.1007/s00247-019-04610-2. Epub 2020 Jan 29. Pediatr Radiol. 2021. PMID: 31996937 Review.
Cited by
-
Multi-modal transformer architecture for medical image analysis and automated report generation.Sci Rep. 2024 Aug 20;14(1):19281. doi: 10.1038/s41598-024-69981-5. Sci Rep. 2024. PMID: 39164302 Free PMC article.
-
Integrating AI in radiology: insights from GPT-generated reports and multimodal LLM performance on European Board of Radiology examinations.Jpn J Radiol. 2024 Sep;42(9):1083-1084. doi: 10.1007/s11604-024-01576-6. Epub 2024 Apr 22. Jpn J Radiol. 2024. PMID: 38647884 No abstract available.
-
Toward Improved Radiologic Diagnostics: Investigating the Utility and Limitations of GPT-3.5 Turbo and GPT-4 with Quiz Cases.AJNR Am J Neuroradiol. 2024 Oct 3;45(10):1506-1511. doi: 10.3174/ajnr.A8332. AJNR Am J Neuroradiol. 2024. PMID: 38719605
-
Comparative analysis of GPT-4-based ChatGPT's diagnostic performance with radiologists using real-world radiology reports of brain tumors.Eur Radiol. 2025 Apr;35(4):1938-1947. doi: 10.1007/s00330-024-11032-8. Epub 2024 Aug 28. Eur Radiol. 2025. PMID: 39198333 Free PMC article.
-
Recent topics in musculoskeletal imaging focused on clinical applications of AI: How should radiologists approach and use AI?Radiol Med. 2025 May;130(5):587-597. doi: 10.1007/s11547-024-01947-z. Epub 2025 Feb 24. Radiol Med. 2025. PMID: 39992330 Review.
References
-
- Kitahara H, Nagatani Y, Otani H, Nakayama R, Kida Y, Sonoda A, et al. A novel strategy to develop deep learning for image super-resolution using original ultra-high-resolution computed tomography images of lung as training dataset. Jpn J Radiol. 2022;40:38–47. doi: 10.1007/s11604-021-01184-8. - DOI - PMC - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
Research Materials