Comparative performance of large language models in structuring head CT radiology reports: multi-institutional validation study in Japan
- PMID: 40366571
- PMCID: PMC12396994
- DOI: 10.1007/s11604-025-01799-1
Comparative performance of large language models in structuring head CT radiology reports: multi-institutional validation study in Japan
Abstract
Purpose: To compare the diagnostic performance of three proprietary large language models (LLMs)-Claude, GPT, and Gemini-in structuring free-text Japanese radiology reports for intracranial hemorrhage and skull fractures, and to assess the impact of three different prompting approaches on model accuracy.
Materials and methods: In this retrospective study, head CT reports from the Japan Medical Imaging Database between 2018 and 2023 were collected. Two board-certified radiologists established the ground truth regarding intracranial hemorrhage and skull fractures through independent review and consensus. Each radiology report was analyzed by three LLMs using three prompting strategies-Standard, Chain of Thought, and Self Consistency prompting. Diagnostic performance (accuracy, precision, recall, and F1-score) was calculated for each LLM-prompt combination and compared using McNemar's tests with Bonferroni correction. Misclassified cases underwent qualitative error analysis.
Results: A total of 3949 head CT reports from 3949 patients (mean age 59 ± 25 years, 56.2% male) were enrolled. Across all institutions, 856 patients (21.6%) had intracranial hemorrhage and 264 patients (6.6%) had skull fractures. All nine LLM-prompt combinations achieved very high accuracy. Claude demonstrated significantly higher accuracy for intracranial hemorrhage than GPT and Gemini, and also outperformed Gemini for skull fractures (p < 0.0001). Gemini's performance improved notably with Chain of Thought prompting. Error analysis revealed common challenges including ambiguous phrases and findings unrelated to intracranial hemorrhage or skull fractures, underscoring the importance of careful prompt design.
Conclusion: All three proprietary LLMs exhibited strong performance in structuring free-text head CT reports for intracranial hemorrhage and skull fractures. While the choice of prompting method influenced accuracy, all models demonstrated robust potential for clinical and research applications. Future work should refine the prompts and validate these approaches in prospective, multilingual settings.
Keywords: Free-text radiology report; Intracranial hemorrhage; Japan medical imaging database; Large language model; Skull fracture; Structured radiology report.
© 2025. The Author(s).
Conflict of interest statement
Declarations. Conflict of interest: The authors have no relevant financial or non-financial interests to disclose. Ethical approval: The study protocol was approved by the Ethical Committee of Juntendo University Graduate School of Medicine and Osaka Metropolitan Universtiy, and the study was conducted in accordance with the Declaration of Helsinki. Informed consent: The requirement for informed consent was waived because, in the Japanese Medical Imaging Database, radiology reports are completely anonymized and there is no concern about identifying personal information.
Figures


References
-
- Carney N, Totten AM, O’Reilly C, Ullman JS, Hawryluk GWJ, Bell MJ, et al. Guidelines for the management of severe traumatic brain injury, Fourth Edition. Neurosurgery. 2017;80:6–15. - PubMed
-
- Stiell IG, Wells GA, Vandemheen K, Clement C, Lesiuk H, Laupacis A, et al. The Canadian CT head rule for patients with minor head injury. Lancet. 2001;357:1391–6. - PubMed
-
- Haydel MJ, Preston CA, Mills TJ, Luber S, Blaudeau E, DeBlieux PM. Indications for computed tomography in patients with minor head injury. N Engl J Med. 2000;343:100–5. - PubMed
-
- Kahn CE Jr, Langlotz CP, Burnside ES, Carrino JA, Channin DS, Hovsepian DM, et al. Toward best practices in radiology reporting. Radiology. 2009;252:852–6. - PubMed
-
- Larson DB, Towbin AJ, Pryor RM, Donnelly LF. Improving consistency in radiology reporting through the use of department-wide standardized structured reporting. Radiology. 2013;267:240–50. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical