Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 23;8(1):240.
doi: 10.1038/s41698-024-00733-4.

Large language model use in clinical oncology

Affiliations

Large language model use in clinical oncology

Nicolas Carl et al. NPJ Precis Oncol. .

Abstract

Large language models (LLMs) are undergoing intensive research for various healthcare domains. This systematic review and meta-analysis assesses current applications, methodologies, and the performance of LLMs in clinical oncology. A mixed-methods approach was used to extract, summarize, and compare methodological approaches and outcomes. This review includes 34 studies. LLMs are primarily evaluated on their ability to answer oncologic questions across various domains. The meta-analysis highlights a significant performance variance, influenced by diverse methodologies and evaluation criteria. Furthermore, differences in inherent model capabilities, prompting strategies, and oncological subdomains contribute to heterogeneity. The lack of use of standardized and LLM-specific reporting protocols leads to methodological disparities, which must be addressed to ensure comparability in LLM research and ultimately leverage the reliable integration of LLM technologies into clinical practice.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: T.J.B. would like to disclose that he is the owner of Smart Health Heidelberg GmbH (Handschuhsheimer Landstr. 9/1, 69120 Heidelberg, Germany; https://smarthealth.de), outside the submitted work. F.W. would like to disclose that he advises Janssen, AstraZeneca, and Adon Health outside the submitted work. J.N.K. would like to disclose consulting services for Owkin, France; DoMore Diagnostics, Norway; Panakeia, UK; AstraZeneca, UK; Scailyte, Switzerland; Mindpeak, Germany; and MultiplexDx, Slovakia. Furthermore, he holds shares in StratifAI GmbH, Germany, has received a research grant from GSK, and has received honoraria from AstraZeneca, Bayer, Eisai, Janssen, MSD, BMS, Roche, Pfizer and Fresenius. J.N.K. is Deputy Editor at npj Precision Oncology, but did not have a role in the editorial assessment of this article. The other authors have no competing interests to declare.

Figures

Fig. 1
Fig. 1. Reporting results of the eligible publications focusing on the application domain ‘Medical Knowledge’, two studies were excluded.
Description of items is provided in the evaluation framework, please consider Table 1. Blue = reported, red = not reported, yellow = not applicable.
Fig. 2
Fig. 2. Categorization of the included grading methods based on the methods used for evaluation of the performance of the LLMs.
The included metrics are grouped into two categories: those assessing correctness, and those assessing readability. The metrics dealing with correctness can be further divided into binary methods, one-dimensional methods, and multidimensional methods.
Fig. 3
Fig. 3. Forest plot showing the reported percentages of correct LLM outputs of studies assessing either GPT-3.5 or GPT-4.
Above the dotted line used GPT-3.5, below the dotted line used GPT-4. N = number of questions evaluated with LLM.
Fig. 4
Fig. 4. Forest plot showing the reported percentages of correct LLM outputs in publications that compared multiple language models in a benchmark.
N = number of questions evaluated with LLM.

References

    1. ChatGPT. https://openai.com/chatgpt.
    1. Karpov, O. E. et al. Analysis of publication activity and research trends in the field of AI medical applications: network approach. Int. J. Environ. Res. Public Health20, 5335 (2023). - PMC - PubMed
    1. Microsoft Copilot. https://www.microsoft.com/en-us/microsoft-copilot.
    1. Gemini. Gemini—Chat to Supercharge Your Ideashttps://gemini.google.com/ (2024).
    1. Meta Llama. Meta Llamahttps://llama.meta.com/ (2024).

LinkOut - more resources