Empowering PET imaging reporting with retrieval-augmented large language models and reading reports database: a pilot single center study

Hongyoon Choi^{1

2

3}, Dongjoo Lee⁴, Yeon-Koo Kang⁵, Minseok Suh^{5

6}

Affiliations

¹ Department of Nuclear Medicine, Seoul National University Hospital, 101 Daehak-ro, Jongno-gu, Seoul, 03080, Republic of Korea. chy1000@snu.ac.kr.
² Department of Nuclear Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea. chy1000@snu.ac.kr.
³ Portrai, Inc., Seoul, Republic of Korea. chy1000@snu.ac.kr.
⁴ Portrai, Inc., Seoul, Republic of Korea.
⁵ Department of Nuclear Medicine, Seoul National University Hospital, 101 Daehak-ro, Jongno-gu, Seoul, 03080, Republic of Korea.
⁶ Department of Nuclear Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea.

PMID: 39843863
PMCID: PMC12119711
DOI: 10.1007/s00259-025-07101-9

Empowering PET imaging reporting with retrieval-augmented large language models and reading reports database: a pilot single center study

Hongyoon Choi et al. Eur J Nucl Med Mol Imaging. 2025 Jun.

. 2025 Jun;52(7):2452-2462.

doi: 10.1007/s00259-025-07101-9. Epub 2025 Jan 23.

Authors

Hongyoon Choi^{1

2

3}, Dongjoo Lee⁴, Yeon-Koo Kang⁵, Minseok Suh^{5

6}

Affiliations

¹ Department of Nuclear Medicine, Seoul National University Hospital, 101 Daehak-ro, Jongno-gu, Seoul, 03080, Republic of Korea. chy1000@snu.ac.kr.
² Department of Nuclear Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea. chy1000@snu.ac.kr.
³ Portrai, Inc., Seoul, Republic of Korea. chy1000@snu.ac.kr.
⁴ Portrai, Inc., Seoul, Republic of Korea.
⁵ Department of Nuclear Medicine, Seoul National University Hospital, 101 Daehak-ro, Jongno-gu, Seoul, 03080, Republic of Korea.
⁶ Department of Nuclear Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea.

PMID: 39843863
PMCID: PMC12119711
DOI: 10.1007/s00259-025-07101-9

Abstract

Purpose: The potential of Large Language Models (LLMs) in enhancing a variety of natural language tasks in clinical fields includes medical imaging reporting. This pilot study examines the efficacy of a retrieval-augmented generation (RAG) LLM system considering zero-shot learning capability of LLMs, integrated with a comprehensive database of PET reading reports, in improving reference to prior reports and decision making.

Methods: We developed a custom LLM framework with retrieval capabilities, leveraging a database of over 10 years of PET imaging reports from a single center. The system uses vector space embedding to facilitate similarity-based retrieval. Queries prompt the system to generate context-based answers and identify similar cases or differential diagnoses. From routine clinical PET readings, experienced nuclear medicine physicians evaluated the performance of system in terms of the relevance of queried similar cases and the appropriateness score of suggested potential diagnoses.

Results: The system efficiently organized embedded vectors from PET reports, showing that imaging reports were accurately clustered within the embedded vector space according to the diagnosis or PET study type. Based on this system, a proof-of-concept chatbot was developed and showed the framework's potential in referencing reports of previous similar cases and identifying exemplary cases for various purposes. From routine clinical PET readings, 84.1% of the cases retrieved relevant similar cases, as agreed upon by all three readers. Using the RAG system, the appropriateness score of the suggested potential diagnoses was significantly better than that of the LLM without RAG. Additionally, it demonstrated the capability to offer differential diagnoses, leveraging the vast database to enhance the completeness and precision of generated reports.

Conclusion: The integration of RAG LLM with a large database of PET imaging reports suggests the potential to support clinical practice of nuclear medicine imaging reading by various tasks of AI including finding similar cases and deriving potential diagnoses from them. This study underscores the potential of advanced AI tools in transforming medical imaging reporting practices.

Keywords: Artificial intelligence; Large language model; PET reports; Retrieval-augmented generation.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: H.C. is a co-founder of Portrai. Ethics approval: The retrospective analysis of human data and the waiver of informed consent were approved by the Institutional Review Board of the Seoul National University Hospital (No. 2401-090-1501). Consent to participate: Written informed consent was acquired from all patients. Consent to publish: Not applicable.

Figures

**Fig. 1**
Workflow of the Chatbot System for Querying PET Imaging Reading Reports. The overall workflow of the proof-of-concept system designed for efficient querying of reading reports from a substantial dataset is illustrated. The system integrates the Retrieval-Augmented Generation (RAG) model with advanced language model technologies, natural language processing, and information retrieval techniques. The workflow demonstrates the process from user query input through to the delivery of the relevant reading report, showcasing the operational framework and interaction with different sources of reading reports

**Fig. 2**
Visualization of PET Imaging Report Embeddings Using t-SNE. (A) t-SNE plot illustrates PET imaging report embeddings from 118,107 patients, totaling 211,813 cases. Each point on the plot represents a unique report, with a selected case highlighted in red to show an example of an original report. (B) t-SNE plots showcases the clustering efficacy of the embeddings, highlighting how reports containing key diagnostic terms like ‘lung cancer’, ‘breast cancer’, ‘lymphoma’, and specific types of exams such as ‘C-11 methionine PET’ and ‘Ga-68 PSMA-11 PET’ form distinct clusters. These clusters indicate the embeddings’ capability to reflect the similarity among cases, demonstrating the potential of this method in facilitating the identification and visualization of related PET imaging reports

**Fig. 3**
Examples of Chatbot Responses to Queries. (A) An example case displays an instance of the chatbot’s capability to accurately identify and present relevant cases in response to a user query about breast cancer with metastasis to internal mammary lymph nodes. It highlights the capacity to navigate a vast database of previous reading reports to identify relevant cases. (B) An example of the utility of system in generating differential diagnoses is displayed. This is demonstrated through the chatbot’s response to a query, where it offers a detailed list of potential diagnoses along with reference identifiers. As an example, by employing identifiers within the PACS system (in this example, we used deidentified information), prior imaging cases could be referenced for understanding cases and supporting decision making

**Fig. 4**
Evaluation of Appropriateness Scores by Nuclear Medicine Physicians. (A) The appropriateness of querying similar cases was assessed. Using a conclusion text to generate the prompt “find similar reports and summarize it,” the system retrieved results. For specific reports, 16 out of 19 (84.2%) were appropriately identified, with all three readers rating these as better than ‘Fair’ in relevance. (B) The appropriateness of potential diagnoses for specific findings was evaluated. Using specific finding texts to generate prompts for suggesting potential diagnoses, the responses of system were assessed. Medical relevance and appropriateness of the suggested potential diagnoses were evaluated by readers. The system without RAG was also assessed, and the performance of the LLM with and without RAG was represented as a heatmap. The results indicated that the LLM with RAG showed significantly better appropriateness scores (p < 0.05). (C) The ROUGE-L F-score was used to quantitatively evaluate the alignment between generated conclusions and reference conclusion texts from finding descriptions. The RAG framework demonstrated significantly higher scores compared to the LLM without RAG (0.16 ± 0.08 vs. 0.07 ± 0.03, p < 0.001)

See this image and copyright information in PMC

References

1. Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023;29:1930–40. - DOI - PubMed
1. Elkassem AA, Smith AD. Potential use cases for ChatGPT in radiology reporting. Am J Roentgenol. 2023. - PubMed
1. Doshi R, Amin K, Khosla P, Bajaj S, Chheang S, Forman HP. Utilizing Large Language Models to Simplify Radiology Reports: a comparative analysis of ChatGPT3. 5, ChatGPT4. 0, Google Bard, and Microsoft Bing. medRxiv. 2023:2023.06. 04.23290786.
1. Alberts IL, Mercolli L, Pyka T, Prenosil G, Shi K, Rominger A, et al. Large language models (LLM) and ChatGPT: what will the impact on nuclear medicine be? Eur J Nucl Med Mol Imaging. 2023;50:1549–52. - DOI - PMC - PubMed
1. Monshi MMA, Poon J, Chung V. Deep learning in generating radiology reports: a survey. Artif Intell Med. 2020;106:101878. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- PubMed Central
- Springer

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Empowering PET imaging reporting with retrieval-augmented large language models and reading reports database: a pilot single center study

Affiliations

Empowering PET imaging reporting with retrieval-augmented large language models and reading reports database: a pilot single center study

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources