GPT-Driven Radiology Report Generation with Fine-Tuned Llama 3

Ștefan-Vlad Voinea¹, Mădălin Mămuleanu¹, Rossy Vlăduț Teică², Lucian Mihai Florescu³, Dan Selișteanu¹, Ioana Andreea Gheonea³

Affiliations

¹ Department of Automatic Control and Electronics, University of Craiova, 200585 Craiova, Romania.
² Doctoral School, University of Medicine and Pharmacy of Craiova, 200349 Craiova, Romania.
³ Department of Radiology and Medical Imaging, University of Medicine and Pharmacy of Craiova, 200349 Craiova, Romania.

PMID: 39451418
PMCID: PMC11504957
DOI: 10.3390/bioengineering11101043

GPT-Driven Radiology Report Generation with Fine-Tuned Llama 3

Ștefan-Vlad Voinea et al. Bioengineering (Basel). 2024.

. 2024 Oct 18;11(10):1043.

doi: 10.3390/bioengineering11101043.

Authors

Ștefan-Vlad Voinea¹, Mădălin Mămuleanu¹, Rossy Vlăduț Teică², Lucian Mihai Florescu³, Dan Selișteanu¹, Ioana Andreea Gheonea³

Affiliations

¹ Department of Automatic Control and Electronics, University of Craiova, 200585 Craiova, Romania.
² Doctoral School, University of Medicine and Pharmacy of Craiova, 200349 Craiova, Romania.
³ Department of Radiology and Medical Imaging, University of Medicine and Pharmacy of Craiova, 200349 Craiova, Romania.

PMID: 39451418
PMCID: PMC11504957
DOI: 10.3390/bioengineering11101043

Abstract

The integration of deep learning into radiology has the potential to enhance diagnostic processes, yet its acceptance in clinical practice remains limited due to various challenges. This study aimed to develop and evaluate a fine-tuned large language model (LLM), based on Llama 3-8B, to automate the generation of accurate and concise conclusions in magnetic resonance imaging (MRI) and computed tomography (CT) radiology reports, thereby assisting radiologists and improving reporting efficiency. A dataset comprising 15,000 radiology reports was collected from the University of Medicine and Pharmacy of Craiova's Imaging Center, covering a diverse range of MRI and CT examinations made by four experienced radiologists. The Llama 3-8B model was fine-tuned using transfer-learning techniques, incorporating parameter quantization to 4-bit precision and low-rank adaptation (LoRA) with a rank of 16 to optimize computational efficiency on consumer-grade GPUs. The model was trained over five epochs using an NVIDIA RTX 3090 GPU, with intermediary checkpoints saved for monitoring. Performance was evaluated quantitatively using Bidirectional Encoder Representations from Transformers Score (BERTScore), Recall-Oriented Understudy for Gisting Evaluation (ROUGE), Bilingual Evaluation Understudy (BLEU), and Metric for Evaluation of Translation with Explicit Ordering (METEOR) metrics on a held-out test set. Additionally, a qualitative assessment was conducted, involving 13 independent radiologists who participated in a Turing-like test and provided ratings for the AI-generated conclusions. The fine-tuned model demonstrated strong quantitative performance, achieving a BERTScore F1 of 0.8054, a ROUGE-1 F1 of 0.4998, a ROUGE-L F1 of 0.4628, and a METEOR score of 0.4282. In the human evaluation, the artificial intelligence (AI)-generated conclusions were preferred over human-written ones in approximately 21.8% of cases, indicating that the model's outputs were competitive with those of experienced radiologists. The average rating of the AI-generated conclusions was 3.65 out of 5, reflecting a generally favorable assessment. Notably, the model maintained its consistency across various types of reports and demonstrated the ability to generalize to unseen data. The fine-tuned Llama 3-8B model effectively generates accurate and coherent conclusions for MRI and CT radiology reports. By automating the conclusion-writing process, this approach can assist radiologists in reducing their workload and enhancing report consistency, potentially addressing some barriers to the adoption of deep learning in clinical practice. The positive evaluations from independent radiologists underscore the model's potential utility. While the model demonstrated strong performance, limitations such as dataset bias, limited sample diversity, a lack of clinical judgment, and the need for large computational resources require further refinement and real-world validation. Future work should explore the integration of such models into clinical workflows, address ethical and legal considerations, and extend this approach to generate complete radiology reports.

Keywords: AI in healthcare; CT scans; Llama 3; MRI reports; automated report generation; convolutional neural networks; deep learning; diagnostic imaging; large language models; radiology.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

**Figure 3**
Loss function during fine-tuning.

**Figure 7**
Turing-like evaluation: all questions.

**Figure 8**
Turing-like evaluation: common questions.

**Figure 9**
Rating evaluation for generated reports.

**Figure 10**
Model-generated conclusion.

See this image and copyright information in PMC

References

1. Voinea Ş.-V., Gheonea I.A., Selişteanu D., Teică R.V., Florescu L.M., Ciofiac C.M., Nica R.E. Detection and Classification of Knee Ligament Pathology based on Convolutional Neural Networks; Proceedings of the 2023 9th International Conference on Control, Decision and Information Technologies CoDIT; Rome, Italy. 3–6 July 2023; pp. 543–548. - DOI
1. Voinea Ș.-V., Gheonea I.A., Teică R.V., Florescu L.M., Roman M., Selișteanu D. Refined Detection and Classification of Knee Ligament Injury Based on ResNet Convolutional Neural Networks. Life. 2024;14:478. doi: 10.3390/life14040478. - DOI - PMC - PubMed
1. Florescu D.N., Ivan E.T., Ciocâlteu A.M., Gheonea I.A., Tudoraşcu D.R., Ciurea T., Gheonea D.I. Narrow Band Imaging Endoscopy for Detection of Precancerous Lesions of Upper Gastrointestinal Tract. Rom. J. Morphol. Embryol.-Rev. Roum. De Morphol. Et Embryol. 2016;57:931–936. - PubMed
1. Gheonea I.A., Streba C.T., Cristea C.G., Stepan A.E., Ciurea M.E., Sas T., Bondari S. MRI and Pathology Aspects of Hypervascular Nodules in Cirrhotic Liver: From Dysplasia to Hepatocarcinoma. Rom. J. Morphol. Embryol. Rev. Roum. De Morphol. Et Embryol. 2015;56:925–935. - PubMed
1. Ungureanu B.S., Pirici D., Margaritescu C., Gheonea I.A., Trincu F.N., Fifere A., Saftoiu A. Endoscopic Ultrasound Guided Injection of Iron Oxide Magnetic Nanoparticles for Liver and Pancreas: A Feasibility Study in Pigs. Med. Ultrason. 2016;18:157–162. doi: 10.11152/mu.2013.2066.182.eus. - DOI - PubMed

LinkOut - more resources

Full Text Sources
- MDPI
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

GPT-Driven Radiology Report Generation with Fine-Tuned Llama 3

Affiliations

GPT-Driven Radiology Report Generation with Fine-Tuned Llama 3

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources