Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians
- PMID: 39729119
- PMCID: PMC11680670
- DOI: 10.1007/s00345-024-05399-y
Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians
Abstract
Purpose: To evaluate the accuracy, comprehensiveness, empathetic tone, and patient preference for AI and urologist responses to patient messages concerning common BPH questions across phases of care.
Methods: Cross-sectional study evaluating responses to 20 BPH-related questions generated by 2 AI chatbots and 4 urologists in a simulated clinical messaging environment without direct patient interaction. Accuracy, completeness, and empathetic tone of responses assessed by experts using Likert scales, and preferences and perceptions of authorship (chatbot vs. human) rated by non-medical evaluators.
Results: Five non-medical volunteers independently evaluated, ranked, and inferred the source for 120 responses (n = 600 total). For volunteer evaluations, the mean (SD) score of chatbots, 3.0 (1.4) (moderately empathetic) was significantly higher than urologists, 2.1 (1.1) (slightly empathetic) (p < 0.001); mean (SD) and preference ranking for chatbots, 2.6 (1.6), was significantly higher than urologist ranking, 3.9 (1.6) (p < 0.001). Two subject matter experts (SMEs) independently evaluated 120 responses each (answers to 20 questions from 4 urologist and 2 chatbots, n = 240 total). For SME evaluations, mean (SD) accuracy score for chatbots was 4.5 (1.1) (nearly all correct) and not significantly different than urologists, 4.6 (1.2). The mean (SD) completeness score for chatbots was 2.4 (0.8) (comprehensive), significantly higher than urologists, 1.6 (0.6) (adequate) (p < 0.001).
Conclusion: Answers to patient BPH messages generated by chatbots were evaluated by experts as equally accurate and more complete than urologist answers. Non-medical volunteers preferred chatbot-generated messages and considered them more empathetic compared to answers generated by urologists.
Keywords: Artificial intelligence (AI); Benign prostatic hyperplasia (BPH); Care experience; ChatGPT; Chatbot; Large language models (LLMs); Patient communication; Patient messages; Physician experience; Sandbox.
© 2024. The Author(s).
Conflict of interest statement
Declarations. Competing interests: The authors declare no competing interests.
Comment in
-
Comment on "Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians".World J Urol. 2025 Jan 22;43(1):83. doi: 10.1007/s00345-025-05448-0. World J Urol. 2025. PMID: 39841262 No abstract available.
-
Chatbot's performance in answering medical questions: the effects of prompt design, customization settings, and session context.World J Urol. 2025 Jan 27;43(1):88. doi: 10.1007/s00345-025-05449-z. World J Urol. 2025. PMID: 39869150 No abstract available.
-
Optimizing AI-assisted communication in urology: potential and challenges.World J Urol. 2025 Feb 14;43(1):122. doi: 10.1007/s00345-025-05508-5. World J Urol. 2025. PMID: 39951154 No abstract available.
-
Letter to the Editor on "Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians".World J Urol. 2025 May 6;43(1):272. doi: 10.1007/s00345-025-05587-4. World J Urol. 2025. PMID: 40327130 No abstract available.
Similar articles
-
Quality of Chatbot Information Related to Benign Prostatic Hyperplasia.Prostate. 2025 Feb;85(2):175-180. doi: 10.1002/pros.24814. Epub 2024 Nov 8. Prostate. 2025. PMID: 39513562
-
Application of AI Chatbot in Responding to Asynchronous Text-Based Messages From Patients With Cancer: Comparative Study.J Med Internet Res. 2025 May 21;27:e67462. doi: 10.2196/67462. J Med Internet Res. 2025. PMID: 40397947 Free PMC article.
-
Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.JAMA Intern Med. 2023 Jun 1;183(6):589-596. doi: 10.1001/jamainternmed.2023.1838. JAMA Intern Med. 2023. PMID: 37115527 Free PMC article.
-
Potential of AI-Driven Chatbots in Urology: Revolutionizing Patient Care Through Artificial Intelligence.Curr Urol Rep. 2024 Jan;25(1):9-18. doi: 10.1007/s11934-023-01184-3. Epub 2023 Sep 19. Curr Urol Rep. 2024. PMID: 37723300 Free PMC article. Review.
-
Artificial Intelligence in the Business of Urology.Urology. 2025 May 7:S0090-4295(25)00424-8. doi: 10.1016/j.urology.2025.04.059. Online ahead of print. Urology. 2025. PMID: 40345451 Review.
Cited by
-
Development and external validation of a nomogram for predicting sepsis following flexible ureteroscopy.Eur J Med Res. 2025 Jun 13;30(1):479. doi: 10.1186/s40001-025-02754-6. Eur J Med Res. 2025. PMID: 40514708 Free PMC article.
-
Comment on "Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians".World J Urol. 2025 Jan 22;43(1):83. doi: 10.1007/s00345-025-05448-0. World J Urol. 2025. PMID: 39841262 No abstract available.
-
[AI-enabled clinical decision support systems: challenges and opportunities].Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2025 Aug;68(8):872-879. doi: 10.1007/s00103-025-04092-8. Epub 2025 Jun 25. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2025. PMID: 40560226 Free PMC article. Review. German.
-
Optimizing AI-assisted communication in urology: potential and challenges.World J Urol. 2025 Feb 14;43(1):122. doi: 10.1007/s00345-025-05508-5. World J Urol. 2025. PMID: 39951154 No abstract available.
References
-
- OpenAI, Achiam J, Adler S, GPT-4 Technical Report (2023).:arXiv:2303.08774. 10.48550/arXiv.2303.08774 Accessed March 01, 2023. https://ui.adsabs.harvard.edu/abs/2023arXiv230308774O
-
- Lee P, Bubeck S, Petro J, Benefits (2023) Limits, and risks of GPT-4 as an AI Chatbot for Medicine. N Engl J Med Mar 30(13):1233–1239. 10.1056/NEJMsr2214184 - PubMed
-
- Shah NH, Entwistle D, Pfeffer MA (2023) Creation and adoption of large Language models in Medicine. JAMA 330(9). 10.1001/jama.2023.14217 - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources