. 2024 Apr 1;7(4):e244630.

doi: 10.1001/jamanetworkopen.2024.4630.

Quality of Large Language Model Responses to Radiation Oncology Patient Care Questions

Amulya Yalamanchili¹, Bishwambhar Sengupta¹, Joshua Song¹, Sara Lim¹, Tarita O Thomas¹, Bharat B Mittal¹, Mohamed E Abazeed¹, P Troy Teo¹

Affiliations

Affiliation

¹ Robert H. Lurie Comprehensive Cancer Center, Department of Radiation Oncology, Northwestern Memorial Hospital, Northwestern University Feinberg School of Medicine, Chicago, Illinois.

PMID: 38564215
PMCID: PMC10988356
DOI: 10.1001/jamanetworkopen.2024.4630

Quality of Large Language Model Responses to Radiation Oncology Patient Care Questions

Amulya Yalamanchili et al. JAMA Netw Open. 2024.

. 2024 Apr 1;7(4):e244630.

doi: 10.1001/jamanetworkopen.2024.4630.

Authors

Amulya Yalamanchili¹, Bishwambhar Sengupta¹, Joshua Song¹, Sara Lim¹, Tarita O Thomas¹, Bharat B Mittal¹, Mohamed E Abazeed¹, P Troy Teo¹

Affiliation

¹ Robert H. Lurie Comprehensive Cancer Center, Department of Radiation Oncology, Northwestern Memorial Hospital, Northwestern University Feinberg School of Medicine, Chicago, Illinois.

PMID: 38564215
PMCID: PMC10988356
DOI: 10.1001/jamanetworkopen.2024.4630

Abstract

Importance: Artificial intelligence (AI) large language models (LLMs) demonstrate potential in simulating human-like dialogue. Their efficacy in accurate patient-clinician communication within radiation oncology has yet to be explored.

Objective: To determine an LLM's quality of responses to radiation oncology patient care questions using both domain-specific expertise and domain-agnostic metrics.

Design, setting, and participants: This cross-sectional study retrieved questions and answers from websites (accessed February 1 to March 20, 2023) affiliated with the National Cancer Institute and the Radiological Society of North America. These questions were used as queries for an AI LLM, ChatGPT version 3.5 (accessed February 20 to April 20, 2023), to prompt LLM-generated responses. Three radiation oncologists and 3 radiation physicists ranked the LLM-generated responses for relative factual correctness, relative completeness, and relative conciseness compared with online expert answers. Statistical analysis was performed from July to October 2023.

Main outcomes and measures: The LLM's responses were ranked by experts using domain-specific metrics such as relative correctness, conciseness, completeness, and potential harm compared with online expert answers on a 5-point Likert scale. Domain-agnostic metrics encompassing cosine similarity scores, readability scores, word count, lexicon, and syllable counts were computed as independent quality checks for LLM-generated responses.

Results: Of the 115 radiation oncology questions retrieved from 4 professional society websites, the LLM performed the same or better in 108 responses (94%) for relative correctness, 89 responses (77%) for completeness, and 105 responses (91%) for conciseness compared with expert answers. Only 2 LLM responses were ranked as having potential harm. The mean (SD) readability consensus score for expert answers was 10.63 (3.17) vs 13.64 (2.22) for LLM answers (P < .001), indicating 10th grade and college reading levels, respectively. The mean (SD) number of syllables was 327.35 (277.15) for expert vs 376.21 (107.89) for LLM answers (P = .07), the mean (SD) word count was 226.33 (191.92) for expert vs 246.26 (69.36) for LLM answers (P = .27), and the mean (SD) lexicon score was 200.15 (171.28) for expert vs 219.10 (61.59) for LLM answers (P = .24).

Conclusions and relevance: In this cross-sectional study, the LLM generated accurate, comprehensive, and concise responses with minimal risk of harm, using language similar to human experts but at a higher reading level. These findings suggest the LLM's potential, with some retraining, as a valuable resource for patient queries in radiation oncology and other medical fields.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Dr Yalamanchili reported a pending patent for 63/531 036 (provisional patent) with Northwestern University IP disclosure Disc-ID-23-05-25-001. Dr Sengupta reported a pending patent for 63/531 036 (provisional patent) with Northwestern University IP disclosure Disc-ID-23-05-25-001. Dr Thomas reported a pending patent for 63/531 036 (provisional patent) with Northwestern University IP disclosure Disc-ID-23-05-25-001. Dr Mittal reported a pending patent for 63/531 036 (provisional patent) with Northwestern University IP disclosure Disc-ID-23-05-25-001. Dr Abazeed reported a pending patent for 63/531 036 (provisional patent) with Northwestern University IP disclosure Disc-ID-23-05-25-001; and receiving funding from the National Institutes of Health (NIH R37CA222294), nonfinancial software support from Siemens Healthineers, and grants for clinical trial support from Varian Medical Systems Inc outside the submitted work. Dr Teo reported receiving fellowship funding from the Canadian Institute of Health Research CIHR-472392; and a pending patent for 63/531 036 (provisional patent) with Northwestern University IP disclosure Disc-ID-23-05-25-001. No other disclosures were reported.

Figures

**Figure 1.. Large Language Model (LLM) Potential Harm Ratings and Comparison of LLM vs Radiation Oncology Expert Responses for 115 Questions**
Likert scale plot including potential harm ratings for all 115 LLM-generated responses (A); followed by comparisons of the LLM’s responses to expert answers from online resources, evaluating relative factual correctness, completeness, and conciseness, across all questions (B), general radiation oncology topics (C), treatment modality–specific issues (D), and treatment site-specific queries (E).

Figure 2.. Relative Factual Correctness, Relative Completeness, and Relative Conciseness of Large Language Model (LLM)–Generated Responses in Each Subcategory Within Treatment Modality–Specific Answers
Likert scale plot for relative factual correctness, relative completeness, and relative conciseness of LLM-generated responses compared with online resource expert answers in each subcategory within treatment modality–specific answers.

**Figure 3.. Relative Factual Correctness, Completeness, and Conciseness of Large Language Model (LLM)–Generated Responses Within Each Treatment Subsite–Specific Category**
Likert scale for relative factual correctness, completeness, and conciseness within each treatment subsite–specific category, covering colorectal, lung, breast, brain, head and neck, and prostate. Results for the remaining subsites are in eFigure 1 in Supplement 1.

**Figure 4.. Computationally Generated Metrics for Large Language Model (LLM)–Generated Responses**
Computationally generated metrics for LLM-generated responses in categories of general radiation oncology issues, treatment modality–specific, and treatment site-specific.

See this image and copyright information in PMC

Cited by

Performance of Large Language Models on Medical Oncology Examination Questions.
Longwell JB, Hirsch I, Binder F, Gonzalez Conchas GA, Mau D, Jang R, Krishnan RG, Grant RC. Longwell JB, et al. JAMA Netw Open. 2024 Jun 3;7(6):e2417641. doi: 10.1001/jamanetworkopen.2024.17641. JAMA Netw Open. 2024. PMID: 38888919 Free PMC article.
Bots in white coats: are large language models the future of patient education? A multicenter cross-sectional analysis.
Aghamaliyev U, Karimbayli J, Zamparas A, Bösch F, Thomas M, Schmidt T, Krautz C, Kahlert C, Schölch S, Angele MK, Niess H, Guba MO, Werner J, Ilmer M, Renz BW. Aghamaliyev U, et al. Int J Surg. 2025 Mar 1;111(3):2376-2384. doi: 10.1097/JS9.0000000000002250. Int J Surg. 2025. PMID: 39878073 Free PMC article.
Application of large language models in disease diagnosis and treatment.
Yang X, Li T, Su Q, Liu Y, Kang C, Lyu Y, Zhao L, Nie Y, Pan Y. Yang X, et al. Chin Med J (Engl). 2025 Jan 20;138(2):130-142. doi: 10.1097/CM9.0000000000003456. Epub 2024 Dec 26. Chin Med J (Engl). 2025. PMID: 39722188 Free PMC article. Review.
How Italian radiation oncologists use ChatGPT: a survey by the young group of the Italian association of radiotherapy and clinical oncology (yAIRO).
Piras A, Mastroleo F, Colciago RR, Morelli I, D'Aviero A, Longo S, Grassi R, Iorio GC, De Felice F, Boldrini L, Desideri I, Salvestrini V. Piras A, et al. Radiol Med. 2025 Apr;130(4):453-462. doi: 10.1007/s11547-024-01945-1. Epub 2024 Dec 17. Radiol Med. 2025. PMID: 39690359
Assessing the Quality and Reliability of ChatGPT's Responses to Radiotherapy-Related Patient Queries: Comparative Study With GPT-3.5 and GPT-4.
Grilo A, Marques C, Corte-Real M, Carolino E, Caetano M. Grilo A, et al. JMIR Cancer. 2025 Apr 16;11:e63677. doi: 10.2196/63677. JMIR Cancer. 2025. PMID: 40239208 Free PMC article.

See all "Cited by" articles

References

1. Kung TH, Cheatham M, Medenilla A, et al. . Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198. doi:10.1371/journal.pdig.0000198 - DOI - PMC - PubMed
1. Hoch CC, Wollenberg B, Lüers JC, et al. . ChatGPT’s quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions. Eur Arch Otorhinolaryngol. 2023;280(9):4271-4278. Published online June 7, 2023. doi:10.1007/s00405-023-08051-4 - DOI - PMC - PubMed
1. Doshi R, Amin K, Khosla P, Bajaj S, Chheang S, Forman HP. Utilizing large language models to simplify radiology reports: a comparative analysis of ChatGPT3.5, ChatGPT4.0, Google Bard, and Microsoft Bing. Published online June 7, 2023. doi:10.1101/2023.06.04.23290786 - DOI
1. Pan A, Musheyev D, Bockelman D, Loeb S, Kabarriti AE. Assessment of artificial intelligence chatbot responses to top searched queries about cancer. JAMA Oncol. 2023;9(10):1437-1440; Epub ahead of print. doi:10.1001/jamaoncol.2023.2947 - DOI - PMC - PubMed
1. Homolak J. Opportunities and risks of ChatGPT in medicine, science, and academic publishing: a modern Promethean dilemma. Croat Med J. 2023;64(1):1-3. doi:10.3325/cmj.2023.64.1 - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Quality of Large Language Model Responses to Radiation Oncology Patient Care Questions

Affiliation

Quality of Large Language Model Responses to Radiation Oncology Patient Care Questions

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous