Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct:8:e2400478.
doi: 10.1200/PO-24-00478. Epub 2024 Oct 30.

Expert-Guided Large Language Models for Clinical Decision Support in Precision Oncology

Affiliations

Expert-Guided Large Language Models for Clinical Decision Support in Precision Oncology

Jacqueline Lammert et al. JCO Precis Oncol. 2024 Oct.

Abstract

Purpose: Rapidly expanding medical literature challenges oncologists seeking targeted cancer therapies. General-purpose large language models (LLMs) lack domain-specific knowledge, limiting their clinical utility. This study introduces the LLM system Medical Evidence Retrieval and Data Integration for Tailored Healthcare (MEREDITH), designed to support treatment recommendations in precision oncology. Built on Google's Gemini Pro LLM, MEREDITH uses retrieval-augmented generation and chain of thought.

Methods: We evaluated MEREDITH on 10 publicly available fictional oncology cases with iterative feedback from a molecular tumor board (MTB) at a major German cancer center. Initially limited to PubMed-indexed literature (draft system), MEREDITH was enhanced to incorporate clinical studies on drug response within the specific tumor type, trial databases, drug approval status, and oncologic guidelines. The MTB provided a benchmark with manually curated treatment recommendations and assessed the clinical relevance of LLM-generated options (qualitative assessment). We measured semantic cosine similarity between LLM suggestions and clinician responses (quantitative assessment).

Results: MEREDITH identified a broader range of treatment options (median 4) compared with MTB experts (median 2). These options included therapies on the basis of preclinical data and combination treatments, expanding the treatment possibilities for consideration by the MTB. This broader approach was achieved by incorporating a curated medical data set that contextualized molecular targetability. Mirroring the approach MTB experts use to evaluate MTB cases improved the LLM's ability to generate relevant suggestions. This is supported by high concordance between LLM suggestions and expert recommendations (94.7% for the enhanced system) and a significant increase in semantic similarity from the draft to the enhanced system (from 0.71 to 0.76, P = .01).

Conclusion: Expert feedback and domain-specific data augment LLM performance. Future research should investigate responsible LLM integration into real-world clinical workflows.

PubMed Disclaimer

LinkOut - more resources