Automated Prediction of Radiological Protocols Using Retrieval Augmented Generation

Affiliations

¹ Center for Augmented Intelligence in Imaging, Mayo Clinic, Jacksonville, FL, 32224, USA. Testagrose.Conrad@mayo.edu.
² Radiology, Mayo Clinic, Rochester, MN, 55905, USA.
³ Center for Augmented Intelligence in Imaging, Mayo Clinic, Jacksonville, FL, 32224, USA.
⁴ Center for Augmented Intelligence in Imaging, Mayo Clinic, Jacksonville, FL, 32224, USA. Erdal.Barbaros@mayo.edu.

PMID: 41840134
DOI: 10.1007/s10278-025-01822-x

Automated Prediction of Radiological Protocols Using Retrieval Augmented Generation

Conrad Testagrose et al. J Imaging Inform Med. 2026.

. 2026 Mar 16.

doi: 10.1007/s10278-025-01822-x. Online ahead of print.

Affiliations

¹ Center for Augmented Intelligence in Imaging, Mayo Clinic, Jacksonville, FL, 32224, USA. Testagrose.Conrad@mayo.edu.
² Radiology, Mayo Clinic, Rochester, MN, 55905, USA.
³ Center for Augmented Intelligence in Imaging, Mayo Clinic, Jacksonville, FL, 32224, USA.
⁴ Center for Augmented Intelligence in Imaging, Mayo Clinic, Jacksonville, FL, 32224, USA. Erdal.Barbaros@mayo.edu.

PMID: 41840134
DOI: 10.1007/s10278-025-01822-x

Abstract

Radiological protocol selection is a critical but time-consuming step in clinical workflow, requiring radiologists to match patient indications with an appropriate MRI or CT protocol. Manual selection can be prone to delays or potential errors, and automated approaches must contend with substantial class imbalance, site-specific variation, and evolving nomenclature. We investigated whether a large language model (LLM) can support reliable protocol selection at scale and whether retrievalaugmented generation (RAG) offers operational advantages over direct fine-tuning. Using patient reports collected across three Mayo Clinic sites (Arizona, Florida, and Rochester) spanning six radiological divisions, we trained site-specific Llama 3.2 3B models for use with and without retrieval augmentation. Division-scoped Facebook AI Similarity Search (FAISS) indexes constructed from procedure and diagnosis text were used to supply contextual evidence in the RAG framework. Both fine-tuned non-RAG and RAG-augmented models achieved strong baseline performance across sites. Paired bootstrap analyses revealed that RAG improved macro F1 at two of three sites (Arizona:: $Δ$ =0.0306, p=0.0074; Florida: $Δ$ =0.0245, p=0.0217) while maintaining equivalent weighted F1. However, at Rochester, RAG showed no macro F1 improvement and significantly degraded weighted F1 ( $Δ$ =-0.0180, p=1.0000), indicating site-specific heterogeneity in RAG effectiveness. RAG introduced an interpretable abstention mechanism with low baseline rates (1-2.5protocol classification without sacrificing common protocol accuracy at most sites, though site-specific tuning may be necessary. Retrieval indexes can be refreshed far more easily than retraining LLMs, enabling continual adaptation to evolving clinical workflows. Future prospective deployment should evaluate real-time accuracy, investigate site-specific performance drivers, and assess abstention as a safety mechanism in clinical decision support.

Keywords: Deep learning; Large language models; Radiology; Retrieval augmented generation.

PubMed Disclaimer

Conflict of interest statement

Declarations. Conflict of Interest: The authors declare no competing interests. Ethical Approval: This study was approved by the Mayo Clinic Institutional Review Board (Name: Semi-automated Protocoling for Access to Radiology Care (SPARC); ID: 24-008846) with exemption from informed consent for retrospective analysis of de-identified clinical data.

References

1. Vaswani, A. et al. Attention is all you need. Advances in neural information processing systems 30 (2017).
1. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) 4171–4186 (2019).
1. Oh, K. et al. Early detection of diabetic retinopathy based on deep learning and ultra-wide-field fundus images. Scientific reports 11, 1897 (2021). - DOI - PubMed - PMC
1. Mohsin, S. N. et al. The role of artificial intelligence in prediction, risk stratification, and personalized treatment planning for congenital heart diseases. Cureus 15 (2023).
1. Nivedhaa, N. A comprehensive study of artificial intelligence’s contribution to streamlining healthcare workflows and enhancing decision-making practices. International Journal of Information Technology and Electrical Engineering (IJITEE) 13, 1–7 (2024).

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Automated Prediction of Radiological Protocols Using Retrieval Augmented Generation

Affiliations

Automated Prediction of Radiological Protocols Using Retrieval Augmented Generation

Authors

Affiliations

Abstract

Conflict of interest statement

References