Patient- and clinician-based evaluation of large language models for patient education in prostate cancer radiotherapy

Affiliations

¹ Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany. christian.trapp@med.uni-muenchen.de.
² Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany.
³ Bavarian Cancer Research Center (BZKF), Munich, Germany.
⁴ German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany.

PMID: 39792259
PMCID: PMC11839798
DOI: 10.1007/s00066-024-02342-3

Patient- and clinician-based evaluation of large language models for patient education in prostate cancer radiotherapy

Christian Trapp et al. Strahlenther Onkol. 2025 Mar.

. 2025 Mar;201(3):333-342.

doi: 10.1007/s00066-024-02342-3. Epub 2025 Jan 10.

Authors

Affiliations

¹ Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany. christian.trapp@med.uni-muenchen.de.
² Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany.
³ Bavarian Cancer Research Center (BZKF), Munich, Germany.
⁴ German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany.

PMID: 39792259
PMCID: PMC11839798
DOI: 10.1007/s00066-024-02342-3

Abstract

Background: This study aims to evaluate the capabilities and limitations of large language models (LLMs) for providing patient education for men undergoing radiotherapy for localized prostate cancer, incorporating assessments from both clinicians and patients.

Methods: Six questions about definitive radiotherapy for prostate cancer were designed based on common patient inquiries. These questions were presented to different LLMs [ChatGPT‑4, ChatGPT-4o (both OpenAI Inc., San Francisco, CA, USA), Gemini (Google LLC, Mountain View, CA, USA), Copilot (Microsoft Corp., Redmond, WA, USA), and Claude (Anthropic PBC, San Francisco, CA, USA)] via the respective web interfaces. Responses were evaluated for readability using the Flesch Reading Ease Index. Five radiation oncologists assessed the responses for relevance, correctness, and completeness using a five-point Likert scale. Additionally, 35 prostate cancer patients evaluated the responses from ChatGPT‑4 for comprehensibility, accuracy, relevance, trustworthiness, and overall informativeness.

Results: The Flesch Reading Ease Index indicated that the responses from all LLMs were relatively difficult to understand. All LLMs provided answers that clinicians found to be generally relevant and correct. The answers from ChatGPT‑4, ChatGPT-4o, and Claude AI were also found to be complete. However, we found significant differences between the performance of different LLMs regarding relevance and completeness. Some answers lacked detail or contained inaccuracies. Patients perceived the information as easy to understand and relevant, with most expressing confidence in the information and a willingness to use ChatGPT‑4 for future medical questions. ChatGPT-4's responses helped patients feel better informed, despite the initially standardized information provided.

Conclusion: Overall, LLMs show promise as a tool for patient education in prostate cancer radiotherapy. While improvements are needed in terms of accuracy and readability, positive feedback from clinicians and patients suggests that LLMs can enhance patient understanding and engagement. Further research is essential to fully realize the potential of artificial intelligence in patient education.

Keywords: AI; Artificial intelligence; ChatGPT; Patient information; Radiation oncology.

PubMed Disclaimer

Conflict of interest statement

Declarations. Conflict of interest: C. Trapp, N. Schmidt-Hegemann, M. Keilholz, S.F. Brose, S.N. Marschner, S. Schönecker, S.H. Maier, D.-C. Dehelean, M. Rottler, D. Konnerth, C. Belka, S. Corradini, and P. Rogowski declare that they have no competing interests. Ethical standards: All procedures performed in studies involving human participants or on human tissue were in accordance with the ethical standards of the institutional and/or national research committee and with the 1975 Helsinki declaration and its later amendments or comparable ethical standards. Informed consent was obtained from all individual participants included in the study.

Figures

**Fig. 1**
Depiction of the study workflow

**Fig. 2**
Clinicians’ overall rating of the answers regarding their relevance. A five-point Likert scale was used, with 1 representing “very irrelevant” and 5 representing “very relevant”

**Fig. 3**
Clinicians’ overall rating of the answers regarding their correctness. A five-point Likert scale was used, with 1 representing “very correct” and 5 representing “very incorrect”

**Fig. 4**
Clinicians’ overall rating of the answers regarding their completeness. A five-point Likert scale was used, with 1 representing “very complete” and 5 representing “very incomplete”

**Fig. 5**
Patients’ overall evaluation of the statements. A five-point Likert scale was used, with 1 representing “not true” and 5 representing “true”

See this image and copyright information in PMC

References

1. Temsah MH, Altamimi I, Jamal A, Alhasan K, Al-Eyadhy A (2023) ChatGPT Surpasses 1000 Publications on PubMed: Envisioning the Road Ahead. Cureus 15(9):e44769. 10.7759/cureus.44769 - DOI - PMC - PubMed
1. Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW (2023) Large language models in medicine. Nat Med 29(8):1930–1940. 10.1038/s41591-023-02448-8 - DOI - PubMed
1. Jeblick K, Schachtner B, Dexl J et al (2023) ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports. Eur Radiol 34(5):2817–2825. 10.1007/s00330-023-10213-1 - DOI - PMC - PubMed
1. Patel SB, Lam K (2023) ChatGPT: the future of discharge summaries? Lancet 5(3):21–23. 10.1016/S2589-7500 - DOI - PubMed
1. Howard A, Hope W, Gerada A (2023) ChatGPT and antimicrobial advice: the end of the consulting infection doctor? Lancet Infect Dis 23(4):405–406. 10.1016/S1473-3099(23)00113-5 - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- PubMed Central
- Springer
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Patient- and clinician-based evaluation of large language models for patient education in prostate cancer radiotherapy

Affiliations

Patient- and clinician-based evaluation of large language models for patient education in prostate cancer radiotherapy

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical