Evaluation of Context-Aware Prompting Techniques for Classification of Tumor Response Categories in Radiology Reports Using Large Language Model
- PMID: 41023521
- DOI: 10.1007/s10278-025-01685-2
Evaluation of Context-Aware Prompting Techniques for Classification of Tumor Response Categories in Radiology Reports Using Large Language Model
Abstract
Radiology reports are essential for medical decision-making, providing crucial data for diagnosing diseases, devising treatment plans, and monitoring disease progression. While large language models (LLMs) have shown promise in processing free-text reports, research on effective prompting techniques for radiologic applications remains limited. To evaluate the effectiveness of LLM-driven classification based on radiology reports in terms of tumor response category (TRC), and to optimize the model through a comparison of four different prompt engineering techniques for effectively performing this classification task in clinical applications, we included 3062 whole-spine contrast-enhanced magnetic resonance imaging (MRI) radiology reports for prompt engineering and validation. TRCs were labeled by two radiologists based on criteria modified from the Response Evaluation Criteria in Solid Tumors (RECIST) guidelines. The Llama3 instruct model was used to classify TRCs in this study through four different prompts: General, In-Context Learning (ICL), Chain-of-Thought (CoT), and ICL with CoT. AUROC, accuracy, precision, recall, and F1-score were calculated against each prompt and model (8B, 70B) with the test report dataset. The average AUROC for ICL (0.96 internally, 0.93 externally) and ICL with CoT prompts (0.97 internally, 0.94 externally) outperformed other prompts. Error increased with prompt complexity, including 0.8% incomplete sentence errors and 11.3% probability-classification inconsistencies. This study demonstrates that context-aware LLM prompts substantially improved the efficiency and effectiveness of classifying TRCs from radiology reports, despite potential intrinsic hallucinations. While further improvements are required for real-world application, our findings suggest that context-aware prompts have significant potential for segmenting complex radiology reports and enhancing oncology clinical workflows.
Keywords: Artificial intelligence; Disease progression; Large language model; Natural language processing; Radiologic report.
© 2025. The Author(s) under exclusive licence to Society for Imaging Informatics in Medicine.
Conflict of interest statement
Declarations. Ethics Approval: This retrospective research study was conducted retrospectively from data obtained for clinical purposes and was approved by the IRB of Severance Hospital, Yonsei University College of Medicine (IRB 4–2024-0354). Consent to Participate: Informed consents from participants were waived by the IRB in this retrospective study. Consent for Publication: All authors consent to the publication of the manuscript in the Journal of Imaging Informatics in Medicine. Competing interests: The authors declare no competing interests.
References
-
- Adams LC, Truhn D, Busch F, Kader A, Niehues SM, Makowski MR, Bressem KK: Leveraging GPT-4 for Post Hoc Transformation of Free-text Radiology Reports into Structured Reporting: A Multilingual Feasibility Study. Radiology 307(4):e230725, 2023. https://doi.org/10.1148/radiol.230725
-
- Nobel JM, Kok EM, Robben SGF: Redefining the structure of structured reporting in radiology. Insights Imaging 11(1):10, 2020. https://doi.org/10.1186/s13244-019-0831-6
-
- Kehl KL, Elmarakeby H, Nishino M, Van Allen EM, Lepisto EM, Hassett MJ, Johnson BE, Schrag D: Assessment of Deep Natural Language Processing in Ascertaining Oncologic Outcomes From Radiology Reports. JAMA Oncol 5(10):1421–1429, 2019. https://doi.org/10.1001/jamaoncol.2019.1800
-
- Donnelly LF, Grzeszczuk R, Guimaraes CV, Zhang W, Bisset Iii GS: Using a Natural Language Processing and Machine Learning Algorithm Program to Analyze Inter-Radiologist Report Style Variation and Compare Variation Between Radiologists When Using Highly Structured Versus More Free Text Reporting. Curr Probl Diagn Radiol 48(6):524–530, 2019. https://doi.org/10.1067/j.cpradiol.2018.09.005
-
- Fink MA, Kades K, Bischoff A, Moll M, Schnell M, Kuchler M, Kohler G, Sellner J, Heussel CP, Kauczor HU, Schlemmer HP, Maier-Hein K, Weber TF, Kleesiek J: Deep Learning-based Assessment of Oncologic Outcomes from Natural Language Processing of Structured Radiology Reports. Radiol Artif Intell 4(5):e220055, 2022. https://doi.org/10.1148/ryai.220055
Grants and funding
LinkOut - more resources
Miscellaneous