Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians

Eric J Robinson¹, Chunyuan Qiu², Stuart Sands³, Mohammad Khan⁴, Shivang Vora⁵, Kenichiro Oshima⁶, Khang Nguyen⁷, L Andrew DiFronzo⁸, David Rhew⁹, Mark I Feng¹⁰

Affiliations

¹ Department of Urology, Los Angeles Medical Center, Kaiser Permanente, Los Angeles, CA, USA.
² Department of Anesthesiology, Baldwin Park Medical Center, Kaiser Permanente, Baldwin Park, CA, USA.
³ Kaiser Permanente, Pleasanton, CA, USA.
⁴ Microsoft Health & Life Sciences, Irvine, CA, USA.
⁵ Microsoft Health & Life Sciences, Dallas, TX, USA.
⁶ Kaiser Permanente, Oakland, CA, USA.
⁷ Department of Family Medicine, Kaiser Permanente, Pasadena, CA, USA.
⁸ Kaiser Permanente, Pasadena, CA, USA.
⁹ Microsoft Health & Life Sciences, Redmond, WA, USA.
¹⁰ Department of Urology, Baldwin Park Medical Center, Kaiser Permanente, 1011 Baldwin Park Blvd., Baldwin Park, CA, 91706, USA. mark.i.feng@kp.org.

PMID: 39729119
PMCID: PMC11680670
DOI: 10.1007/s00345-024-05399-y

Comparative Study

Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians

Eric J Robinson et al. World J Urol. 2024.

. 2024 Dec 27;43(1):48.

doi: 10.1007/s00345-024-05399-y.

Authors

Eric J Robinson¹, Chunyuan Qiu², Stuart Sands³, Mohammad Khan⁴, Shivang Vora⁵, Kenichiro Oshima⁶, Khang Nguyen⁷, L Andrew DiFronzo⁸, David Rhew⁹, Mark I Feng¹⁰

Affiliations

¹ Department of Urology, Los Angeles Medical Center, Kaiser Permanente, Los Angeles, CA, USA.
² Department of Anesthesiology, Baldwin Park Medical Center, Kaiser Permanente, Baldwin Park, CA, USA.
³ Kaiser Permanente, Pleasanton, CA, USA.
⁴ Microsoft Health & Life Sciences, Irvine, CA, USA.
⁵ Microsoft Health & Life Sciences, Dallas, TX, USA.
⁶ Kaiser Permanente, Oakland, CA, USA.
⁷ Department of Family Medicine, Kaiser Permanente, Pasadena, CA, USA.
⁸ Kaiser Permanente, Pasadena, CA, USA.
⁹ Microsoft Health & Life Sciences, Redmond, WA, USA.
¹⁰ Department of Urology, Baldwin Park Medical Center, Kaiser Permanente, 1011 Baldwin Park Blvd., Baldwin Park, CA, 91706, USA. mark.i.feng@kp.org.

PMID: 39729119
PMCID: PMC11680670
DOI: 10.1007/s00345-024-05399-y

Abstract

Purpose: To evaluate the accuracy, comprehensiveness, empathetic tone, and patient preference for AI and urologist responses to patient messages concerning common BPH questions across phases of care.

Methods: Cross-sectional study evaluating responses to 20 BPH-related questions generated by 2 AI chatbots and 4 urologists in a simulated clinical messaging environment without direct patient interaction. Accuracy, completeness, and empathetic tone of responses assessed by experts using Likert scales, and preferences and perceptions of authorship (chatbot vs. human) rated by non-medical evaluators.

Results: Five non-medical volunteers independently evaluated, ranked, and inferred the source for 120 responses (n = 600 total). For volunteer evaluations, the mean (SD) score of chatbots, 3.0 (1.4) (moderately empathetic) was significantly higher than urologists, 2.1 (1.1) (slightly empathetic) (p < 0.001); mean (SD) and preference ranking for chatbots, 2.6 (1.6), was significantly higher than urologist ranking, 3.9 (1.6) (p < 0.001). Two subject matter experts (SMEs) independently evaluated 120 responses each (answers to 20 questions from 4 urologist and 2 chatbots, n = 240 total). For SME evaluations, mean (SD) accuracy score for chatbots was 4.5 (1.1) (nearly all correct) and not significantly different than urologists, 4.6 (1.2). The mean (SD) completeness score for chatbots was 2.4 (0.8) (comprehensive), significantly higher than urologists, 1.6 (0.6) (adequate) (p < 0.001).

Conclusion: Answers to patient BPH messages generated by chatbots were evaluated by experts as equally accurate and more complete than urologist answers. Non-medical volunteers preferred chatbot-generated messages and considered them more empathetic compared to answers generated by urologists.

Keywords: Artificial intelligence (AI); Benign prostatic hyperplasia (BPH); Care experience; ChatGPT; Chatbot; Large language models (LLMs); Patient communication; Patient messages; Physician experience; Sandbox.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Comment in

Comment on "Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians".
Kleebayoon A, Wiwanitkit V. Kleebayoon A, et al. World J Urol. 2025 Jan 22;43(1):83. doi: 10.1007/s00345-025-05448-0. World J Urol. 2025. PMID: 39841262 No abstract available.
Chatbot's performance in answering medical questions: the effects of prompt design, customization settings, and session context.
Ezer M. Ezer M. World J Urol. 2025 Jan 27;43(1):88. doi: 10.1007/s00345-025-05449-z. World J Urol. 2025. PMID: 39869150 No abstract available.
Optimizing AI-assisted communication in urology: potential and challenges.
Fang Y, Chen S, Cheng B. Fang Y, et al. World J Urol. 2025 Feb 14;43(1):122. doi: 10.1007/s00345-025-05508-5. World J Urol. 2025. PMID: 39951154 No abstract available.
Letter to the Editor on "Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians".
Romagnoli M, Lombardo R, De Nunzio C. Romagnoli M, et al. World J Urol. 2025 May 6;43(1):272. doi: 10.1007/s00345-025-05587-4. World J Urol. 2025. PMID: 40327130 No abstract available.

References

1. Kim SH, Tae JH, Chang IH et al (2023) Changes in patient perceptions regarding ChatGPT-written explanations on lifestyle modifications for preventing urolithiasis recurrence. Digit Health Jan-Dec 9:20552076231203940. 10.1177/20552076231203940 - DOI - PMC - PubMed
1. OpenAI, Achiam J, Adler S, GPT-4 Technical Report (2023).:arXiv:2303.08774. 10.48550/arXiv.2303.08774 Accessed March 01, 2023. https://ui.adsabs.harvard.edu/abs/2023arXiv230308774O
1. Lee P, Bubeck S, Petro J, Benefits (2023) Limits, and risks of GPT-4 as an AI Chatbot for Medicine. N Engl J Med Mar 30(13):1233–1239. 10.1056/NEJMsr2214184 - DOI - PubMed
1. Hong G, Smith M, Lin S (2022) The AI will see you now: feasibility and acceptability of a conversational AI Medical Interviewing System. JMIR Form Res Jun 27(6):e37028. 10.2196/37028 - DOI - PMC - PubMed
1. Shah NH, Entwistle D, Pfeffer MA (2023) Creation and adoption of large Language models in Medicine. JAMA 330(9). 10.1001/jama.2023.14217 - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- PubMed Central
- Springer

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians

Affiliations

Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians

Authors

Affiliations

Abstract

Conflict of interest statement

Comment in

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources