Automated Prediction of Radiological Protocols Using Retrieval Augmented Generation
- PMID: 41840134
- DOI: 10.1007/s10278-025-01822-x
Automated Prediction of Radiological Protocols Using Retrieval Augmented Generation
Abstract
Radiological protocol selection is a critical but time-consuming step in clinical workflow, requiring radiologists to match patient indications with an appropriate MRI or CT protocol. Manual selection can be prone to delays or potential errors, and automated approaches must contend with substantial class imbalance, site-specific variation, and evolving nomenclature. We investigated whether a large language model (LLM) can support reliable protocol selection at scale and whether retrievalaugmented generation (RAG) offers operational advantages over direct fine-tuning. Using patient reports collected across three Mayo Clinic sites (Arizona, Florida, and Rochester) spanning six radiological divisions, we trained site-specific Llama 3.2 3B models for use with and without retrieval augmentation. Division-scoped Facebook AI Similarity Search (FAISS) indexes constructed from procedure and diagnosis text were used to supply contextual evidence in the RAG framework. Both fine-tuned non-RAG and RAG-augmented models achieved strong baseline performance across sites. Paired bootstrap analyses revealed that RAG improved macro F1 at two of three sites (Arizona:: =0.0306, p=0.0074; Florida: =0.0245, p=0.0217) while maintaining equivalent weighted F1. However, at Rochester, RAG showed no macro F1 improvement and significantly degraded weighted F1 ( =-0.0180, p=1.0000), indicating site-specific heterogeneity in RAG effectiveness. RAG introduced an interpretable abstention mechanism with low baseline rates (1-2.5protocol classification without sacrificing common protocol accuracy at most sites, though site-specific tuning may be necessary. Retrieval indexes can be refreshed far more easily than retraining LLMs, enabling continual adaptation to evolving clinical workflows. Future prospective deployment should evaluate real-time accuracy, investigate site-specific performance drivers, and assess abstention as a safety mechanism in clinical decision support.
Keywords: Deep learning; Large language models; Radiology; Retrieval augmented generation.
© 2026. The Author(s) under exclusive licence to Society for Imaging Informatics in Medicine.
Conflict of interest statement
Declarations. Conflict of Interest: The authors declare no competing interests. Ethical Approval: This study was approved by the Mayo Clinic Institutional Review Board (Name: Semi-automated Protocoling for Access to Radiology Care (SPARC); ID: 24-008846) with exemption from informed consent for retrospective analysis of de-identified clinical data.
References
-
- Vaswani, A. et al. Attention is all you need. Advances in neural information processing systems 30 (2017).
-
- Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) 4171–4186 (2019).
-
- Mohsin, S. N. et al. The role of artificial intelligence in prediction, risk stratification, and personalized treatment planning for congenital heart diseases. Cureus 15 (2023).
-
- Nivedhaa, N. A comprehensive study of artificial intelligence’s contribution to streamlining healthcare workflows and enhancing decision-making practices. International Journal of Information Technology and Electrical Engineering (IJITEE) 13, 1–7 (2024).