Large language model processing capabilities of ChatGPT 4.0 to generate molecular tumor board recommendations-a critical evaluation on real world data
- PMID: 40973166
- PMCID: PMC12557318
- DOI: 10.1093/oncolo/oyaf293
Large language model processing capabilities of ChatGPT 4.0 to generate molecular tumor board recommendations-a critical evaluation on real world data
Abstract
Background: Large language models (LLMs) like ChatGPT 4.0 hold promise for enhancing clinical decision-making in precision oncology, particularly within molecular tumor boards (MTBs). This study assesses ChatGPT 4.0's performance in generating therapy recommendations for complex real-world cancer cases compared to expert human MTB (hMTB) teams.
Methods: We retrospectively analyzed 20 anonymized MTB cases from the Comprehensive Cancer Center Augsburg (CCCA), covering breast cancer (n = 3), glioblastoma (n = 3), colorectal cancer (n = 2), and rare tumors. ChatGPT 4.0 recommendations were evaluated against hMTB outputs using metrics including recommendation type (therapeutic/diagnostic), information density (IDM), consistency, quality (level of evidence [LoE]), and efficiency. Each case was prompted thrice to evaluate variability (Fleiss' Kappa).
Results: ChatGPT 4.0 generated more therapeutic recommendations per case than hMTB (median 3 vs 1, P = .005), with comparable diagnostic suggestions (median 1 vs 2, P = .501). Therapeutic scope from ChatGPT 4.0 included off-label and clinical trial options. IDM scores indicated similar content depth between ChatGPT 4.0 (median 0.67) and hMTB (median 0.75; P = .084). Moderate consistency was observed across replicate runs (median Fleiss' Kappa = 0.51). ChatGPT 4.0 occasionally utilized lower-level or preclinical evidence more frequently (P = .0019). Efficiency favored ChatGPT 4.0 significantly (median 15.2 vs 34.7 minutes; P < .001).
Conclusion: Incorporating ChatGPT 4.0 into MTB workflows enhances efficiency and provides relevant recommendations, especially in guideline-supported cases. However, variability in evidence prioritization highlights the need for ongoing human oversight. A hybrid approach, integrating human expertise with LLM support, may optimize precision oncology decision-making.
Keywords: ChatGPT 4.0; artificial intelligence; large language models; molecular tumor board; precision oncology; variant annotation.
© The Author(s) 2025. Published by Oxford University Press.
Conflict of interest statement
The authors have no conflicts of interest to declare.
Figures
References
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
