Proof-of-concept study of a small language model chatbot for breast cancer decision support - a transparent, source-controlled, explainable and data-secure approach
- PMID: 39382778
- PMCID: PMC11464535
- DOI: 10.1007/s00432-024-05964-3
Proof-of-concept study of a small language model chatbot for breast cancer decision support - a transparent, source-controlled, explainable and data-secure approach
Abstract
Purpose: Large language models (LLM) show potential for decision support in breast cancer care. Their use in clinical care is currently prohibited by lack of control over sources used for decision-making, explainability of the decision-making process and health data security issues. Recent development of Small Language Models (SLM) is discussed to address these challenges. This preclinical proof-of-concept study tailors an open-source SLM to the German breast cancer guideline (BC-SLM) to evaluate initial clinical accuracy and technical functionality in a preclinical simulation.
Methods: A multidisciplinary tumor board (MTB) is used as the gold-standard to assess the initial clinical accuracy in terms of concordance of the BC-SLM with MTB and comparing it to two publicly available LLM, ChatGPT3.5 and 4. The study includes 20 fictional patient profiles and recommendations for 5 treatment modalities, resulting in 100 binary treatment recommendations (recommended or not recommended). Statistical evaluation includes concordance with MTB in % including Cohen's Kappa statistic (κ). Technical functionality is assessed qualitatively in terms of local hosting, adherence to the guideline and information retrieval.
Results: The overall concordance amounts to 86% for BC-SLM (κ = 0.721, p < 0.001), 90% for ChatGPT4 (κ = 0.820, p < 0.001) and 83% for ChatGPT3.5 (κ = 0.661, p < 0.001). Specific concordance for each treatment modality ranges from 65 to 100% for BC-SLM, 85-100% for ChatGPT4, and 55-95% for ChatGPT3.5. The BC-SLM is locally functional, adheres to the standards of the German breast cancer guideline and provides referenced sections for its decision-making.
Conclusion: The tailored BC-SLM shows initial clinical accuracy and technical functionality, with concordance to the MTB that is comparable to publicly-available LLMs like ChatGPT4 and 3.5. This serves as a proof-of-concept for adapting a SLM to an oncological disease and its guideline to address prevailing issues with LLM by ensuring decision transparency, explainability, source control, and data security, which represents a necessary step towards clinical validation and safe use of language models in clinical oncology.
Keywords: Artificial intelligence; Breast cancer; Clinical oncology; Large language model; Small language model.
© 2024. The Author(s).
Conflict of interest statement
The authors declare no competing interests.
Figures
References
-
- Basu P, Mukhopadhyay A, Konishi I (2018) Targeted therapy for gynecologic cancers: toward the era of precision medicine. Int J Gynecol Obstet 143:131–136. 10.1002/ijgo.12620 - PubMed
-
- Borchert F, Lohr C, Modersohn L et al (2022) GGPONC 2.0-The German clinical guideline corpus for oncology: curation workflow, annotation policy, baseline NER raggers. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3650–3660, Marseille, France. European Language Resources Association
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
