Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 1;10(7):956-960.
doi: 10.1001/jamaoncol.2024.0836.

Physician and Artificial Intelligence Chatbot Responses to Cancer Questions From Social Media

Affiliations

Physician and Artificial Intelligence Chatbot Responses to Cancer Questions From Social Media

David Chen et al. JAMA Oncol. .

Abstract

Importance: Artificial intelligence (AI) chatbots pose the opportunity to draft template responses to patient questions. However, the ability of chatbots to generate responses based on domain-specific knowledge of cancer remains to be tested.

Objective: To evaluate the competency of AI chatbots (GPT-3.5 [chatbot 1], GPT-4 [chatbot 2], and Claude AI [chatbot 3]) to generate high-quality, empathetic, and readable responses to patient questions about cancer.

Design, setting, and participants: This equivalence study compared the AI chatbot responses and responses by 6 verified oncologists to 200 patient questions about cancer from a public online forum. Data were collected on May 31, 2023.

Exposures: Random sample of 200 patient questions related to cancer from a public online forum (Reddit r/AskDocs) spanning from January 1, 2018, to May 31, 2023, was posed to 3 AI chatbots.

Main outcomes and measures: The primary outcomes were pilot ratings of the quality, empathy, and readability on a Likert scale from 1 (very poor) to 5 (very good). Two teams of attending oncology specialists evaluated each response based on pilot measures of quality, empathy, and readability in triplicate. The secondary outcome was readability assessed using Flesch-Kincaid Grade Level.

Results: Responses to 200 questions generated by chatbot 3, the best-performing AI chatbot, were rated consistently higher in overall measures of quality (mean, 3.56 [95% CI, 3.48-3.63] vs 3.00 [95% CI, 2.91-3.09]; P < .001), empathy (mean, 3.62 [95% CI, 3.53-3.70] vs 2.43 [95% CI, 2.32-2.53]; P < .001), and readability (mean, 3.79 [95% CI, 3.72-3.87] vs 3.07 [95% CI, 3.00-3.15]; P < .001) compared with physician responses. The mean Flesch-Kincaid Grade Level of physician responses (mean, 10.11 [95% CI, 9.21-11.03]) was not significantly different from chatbot 3 responses (mean, 10.31 [95% CI, 9.89-10.72]; P > .99) but was lower than those from chatbot 1 (mean, 12.33 [95% CI, 11.84-12.83]; P < .001) and chatbot 2 (mean, 11.32 [95% CI, 11.05-11.79]; P = .01).

Conclusions and relevance: The findings of this study suggest that chatbots can generate quality, empathetic, and readable responses to patient questions comparable to physician responses sourced from an online forum. Further research is required to assess the scope, process integration, and patient and physician outcomes of chatbot-facilitated interactions.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Mr Parsa reported receiving grants from the Canadian Association of Radiation Oncology CARO-CROF Studentship Award during the conduct of the study. Dr Hope reported receiving personal fees from AstraZeneca Canada outside the submitted work. Dr Raman reported receiving studentship awards from CARO-CROF, Robert L. Tundermann and Christine E. Couturier philanthropic funds, and T-CAIREM during the conduct of the study. No other disclosures were reported.

Figures

Figure 1.
Figure 1.. Overall Median and Distribution of Physician Rater Evaluations of Physician and Chatbot Responses to Patient Questions
Quality (A), empathy (B), and readability (C) of physician and chatbot responses to patient questions. Chatbot 1 indicates GPT-3.5; chatbot 2, GPT-4; and chatbot 3, Claude AI. The midline indicates the median (50% percentile); the box, 25% and 75% percentile; whiskers, 5% and 95% percentile; and the density distribution plot represents the probability density of the response score distribution.
Figure 2.
Figure 2.. Measures of Cognitive Load of Patient Questions, Physician Responses, and Chatbot Responses
Means and 95% CIs are shown for word count (A). Flesch-Kincaid Grade Level of readability (B). Chatbot 1 indicates GPT-3.5; chatbot 2, GPT-4; and chatbot 3, Claude AI.

References

    1. Garg S, Williams NL, Ip A, Dicker AP. Clinical integration of digital solutions in health care: an overview of the current landscape of digital technologies in cancer care. JCO Clin Cancer Inform. 2018;2:1-9. doi:10.1200/CCI.17.00159 - DOI - PubMed
    1. Haug CJ, Drazen JM. Artificial intelligence and machine learning in clinical medicine, 2023. N Engl J Med. 2023;388(13):1201-1208. doi:10.1056/NEJMra2302038 - DOI - PubMed
    1. Siglen E, Vetti HH, Lunde ABF, et al. . Ask Rosa—the making of a digital genetic conversation tool, a chatbot, about hereditary breast and ovarian cancer. Patient Educ Couns. 2022;105(6):1488-1494. doi:10.1016/j.pec.2021.09.027 - DOI - PubMed
    1. Görtz M, Baumgärtner K, Schmid T, et al. . An artificial intelligence–based chatbot for prostate cancer education: design and patient evaluation study. Digit Health. 2023;9:20552076231173304. doi:10.1177/20552076231173304 - DOI - PMC - PubMed
    1. Chaix B, Bibault JE, Pienkowski A, et al. . When chatbots meet patients: one-year prospective study of conversations between patients with breast cancer and a chatbot. JMIR Cancer. 2019;5(1):e12856. doi:10.2196/12856 - DOI - PMC - PubMed