Large language model use in clinical oncology

Affiliations

¹ Department of Digital Prevention, Diagnostics and Therapy Guidance, German Cancer Research Center (DKFZ), Heidelberg, Germany.
² Department of Urology and Urological Surgery, University Medical Center Mannheim, Ruprecht-Karls University Heidelberg, Mannheim, Germany.
³ Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, Technical University Dresden, Dresden, Germany.
⁴ Medical Faculty, Ruprecht-Karls University Heidelberg, Heidelberg, Germany.
⁵ Department of Digital Prevention, Diagnostics and Therapy Guidance, German Cancer Research Center (DKFZ), Heidelberg, Germany. titus.brinker@dkfz.de.

^# Contributed equally.

PMID: 39443582
PMCID: PMC11499929
DOI: 10.1038/s41698-024-00733-4

Large language model use in clinical oncology

Nicolas Carl et al. NPJ Precis Oncol. 2024.

. 2024 Oct 23;8(1):240.

doi: 10.1038/s41698-024-00733-4.

Authors

Affiliations

¹ Department of Digital Prevention, Diagnostics and Therapy Guidance, German Cancer Research Center (DKFZ), Heidelberg, Germany.
² Department of Urology and Urological Surgery, University Medical Center Mannheim, Ruprecht-Karls University Heidelberg, Mannheim, Germany.
³ Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, Technical University Dresden, Dresden, Germany.
⁴ Medical Faculty, Ruprecht-Karls University Heidelberg, Heidelberg, Germany.
⁵ Department of Digital Prevention, Diagnostics and Therapy Guidance, German Cancer Research Center (DKFZ), Heidelberg, Germany. titus.brinker@dkfz.de.

^# Contributed equally.

PMID: 39443582
PMCID: PMC11499929
DOI: 10.1038/s41698-024-00733-4

Abstract

Large language models (LLMs) are undergoing intensive research for various healthcare domains. This systematic review and meta-analysis assesses current applications, methodologies, and the performance of LLMs in clinical oncology. A mixed-methods approach was used to extract, summarize, and compare methodological approaches and outcomes. This review includes 34 studies. LLMs are primarily evaluated on their ability to answer oncologic questions across various domains. The meta-analysis highlights a significant performance variance, influenced by diverse methodologies and evaluation criteria. Furthermore, differences in inherent model capabilities, prompting strategies, and oncological subdomains contribute to heterogeneity. The lack of use of standardized and LLM-specific reporting protocols leads to methodological disparities, which must be addressed to ensure comparability in LLM research and ultimately leverage the reliable integration of LLM technologies into clinical practice.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: T.J.B. would like to disclose that he is the owner of Smart Health Heidelberg GmbH (Handschuhsheimer Landstr. 9/1, 69120 Heidelberg, Germany; https://smarthealth.de), outside the submitted work. F.W. would like to disclose that he advises Janssen, AstraZeneca, and Adon Health outside the submitted work. J.N.K. would like to disclose consulting services for Owkin, France; DoMore Diagnostics, Norway; Panakeia, UK; AstraZeneca, UK; Scailyte, Switzerland; Mindpeak, Germany; and MultiplexDx, Slovakia. Furthermore, he holds shares in StratifAI GmbH, Germany, has received a research grant from GSK, and has received honoraria from AstraZeneca, Bayer, Eisai, Janssen, MSD, BMS, Roche, Pfizer and Fresenius. J.N.K. is Deputy Editor at npj Precision Oncology, but did not have a role in the editorial assessment of this article. The other authors have no competing interests to declare.

Figures

**Fig. 1. Reporting results of the eligible publications focusing on the application domain ‘Medical Knowledge’, two studies were excluded.**
Description of items is provided in the evaluation framework, please consider Table 1. Blue = reported, red = not reported, yellow = not applicable.

**Fig. 2. Categorization of the included grading methods based on the methods used for evaluation of the performance of the LLMs.**
The included metrics are grouped into two categories: those assessing correctness, and those assessing readability. The metrics dealing with correctness can be further divided into binary methods, one-dimensional methods, and multidimensional methods.

**Fig. 3. Forest plot showing the reported percentages of correct LLM outputs of studies assessing either GPT-3.5 or GPT-4.**
Above the dotted line used GPT-3.5, below the dotted line used GPT-4. N = number of questions evaluated with LLM.

**Fig. 4. Forest plot showing the reported percentages of correct LLM outputs in publications that compared multiple language models in a benchmark.**
N = number of questions evaluated with LLM.

See this image and copyright information in PMC

References

1. ChatGPT. https://openai.com/chatgpt.
1. Karpov, O. E. et al. Analysis of publication activity and research trends in the field of AI medical applications: network approach. Int. J. Environ. Res. Public Health20, 5335 (2023). - DOI - PMC - PubMed
1. Microsoft Copilot. https://www.microsoft.com/en-us/microsoft-copilot.
1. Gemini. Gemini—Chat to Supercharge Your Ideashttps://gemini.google.com/ (2024).
1. Meta Llama. Meta Llamahttps://llama.meta.com/ (2024).

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Large language model use in clinical oncology

Affiliations

Large language model use in clinical oncology

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources