Assessing Completeness of Clinical Histories Accompanying Imaging Orders Using Adapted Open-Source and Closed-Source Large Language Models
- PMID: 39998369
- PMCID: PMC11868845
- DOI: 10.1148/radiol.241051
Assessing Completeness of Clinical Histories Accompanying Imaging Orders Using Adapted Open-Source and Closed-Source Large Language Models
Abstract
Background Incomplete clinical histories are a well-known problem in radiology. Previous dedicated quality improvement efforts focusing on reproducible assessments of the completeness of free-text clinical histories have relied on tedious manual analysis. Purpose To adapt and evaluate open-source and closed-source large language models (LLMs) for their ability to automatically extract clinical history elements within imaging orders and to use the best-performing adapted open-source model to assess the completeness of a large sample of clinical histories as a benchmark for clinical practice. Materials and Methods This retrospective single-site study used previously extracted information accompanying CT, MRI, US, and radiography orders from August 2020 to May 2022 at an adult and pediatric emergency department of a 613-bed tertiary academic medical center. Two open-source (Llama 2-7B [Meta], Mistral-7B [Mistral AI]) and one closed-source (GPT-4 Turbo [OpenAI]) LLMs were adapted using prompt engineering, in-context learning, and fine-tuning (open-source only) to extract the elements "past medical history," "what," "when," "where," and "clinical concern" from clinical histories. Model performance, interreader agreement using Cohen κ (none to slight, 0.01-0.20; fair, 0.21-0.40; moderate, 0.41-0.60; substantial, 0.61-0.80; almost perfect, 0.81-1.00), and semantic similarity between the models and the adjudicated manual annotations of two board-certified radiologists with 16 and 3 years of postfellowship experience, respectively, were assessed using accuracy, Cohen κ, and BERTScore, an LLM metric that quantifies how well two pieces of text convey the same meaning; 95% CIs were also calculated. The best-performing open-source model was then used to assess completeness on a large dataset of unannotated clinical histories. Results A total of 50 186 clinical histories were included (794 training, 150 validation, 300 initial testing, 48 942 real-world application). Of the two open-source models, Mistral-7B outperformed Llama 2-7B in assessing completeness and was further fine-tuned. Both Mistral-7B and GPT-4 Turbo showed substantial overall agreement with radiologists (mean κ, 0.73 [95% CI: 0.67, 0.78] to 0.77 [95% CI: 0.71, 0.82]) and adjudicated annotations (mean BERTScore, 0.96 [95% CI: 0.96, 0.97] for both models; P = .38). Mistral-7B also rivaled GPT-4 Turbo in performance (weighted overall mean accuracy, 91% [95% CI: 89, 93] vs 92% [95% CI: 90, 94]; P = .31) despite being a smaller model. Using Mistral-7B, 26.2% (12 803 of 48 942) of unannotated clinical histories were found to contain all five elements. Conclusion An easily deployable fine-tuned open-source LLM (Mistral-7B), rivaling GPT-4 Turbo in performance, could effectively extract clinical history elements with substantial agreement with radiologists and produce a benchmark for completeness of a large sample of clinical histories. The model and code will be fully open-sourced. © RSNA, 2025 Supplemental material is available for this article.
Conflict of interest statement
References
-
- Hartung MP , Bickle IC , Gaillard F , Kanne JP . How to Create a Great Radiology Report . RadioGraphics 2020. ; 40 ( 6 ): 1658 – 1670 . - PubMed
-
- Yapp KE , Brennan P , Ekpo E . The Effect of Clinical History on Diagnostic Imaging Interpretation - A Systematic Review . Acad Radiol 2022. ; 29 ( 2 ): 255 – 266 . - PubMed
-
- Hattori S , Yokota H , Takada T , et al. . Impact of clinical information on CT diagnosis by radiologist and subsequent clinical management by physician in acute abdominal pain . Eur Radiol 2021. ; 31 ( 8 ): 5454 – 5463 . - PubMed
-
- Ihuhua P , Pitcher RD . Is the devil in the detail? The quality and clinical impact of information provided on requests for non-trauma emergency abdominal CT scans . Acta Radiol 2016. ; 57 ( 10 ): 1217 – 1222 . - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical
Miscellaneous