Assessing Artificial Intelligence-Generated Responses to Urology Patient In-Basket Messages

Affiliations

¹ Department of Urology, Stanford University School of Medicine, Palo Alto, California.
² Idaho Urologic Institute, Meridian, Idaho.
³ Unit of Urology/Division of Oncology, IRCCS Ospedale San Rafaele, URI, Milan, Italy.
⁴ Department of Maternal-Infant and Urological Sciences, "Sapienza" Rome University, Policlinico Umberto I Hospital, Rome, Italy.
⁵ Emory School of Medicine, Emory University, Atlanta, Georgia.

PMID: 39162591
DOI: 10.1097/UPJ.0000000000000637

Assessing Artificial Intelligence-Generated Responses to Urology Patient In-Basket Messages

Michael Scott et al. Urol Pract. 2024 Sep.

. 2024 Sep;11(5):793-798.

doi: 10.1097/UPJ.0000000000000637. Epub 2024 Jun 24.

Authors

Affiliations

¹ Department of Urology, Stanford University School of Medicine, Palo Alto, California.
² Idaho Urologic Institute, Meridian, Idaho.
³ Unit of Urology/Division of Oncology, IRCCS Ospedale San Rafaele, URI, Milan, Italy.
⁴ Department of Maternal-Infant and Urological Sciences, "Sapienza" Rome University, Policlinico Umberto I Hospital, Rome, Italy.
⁵ Emory School of Medicine, Emory University, Atlanta, Georgia.

PMID: 39162591
DOI: 10.1097/UPJ.0000000000000637

Abstract

Introduction: Electronic patient messaging utilization has increased in recent years and has been associated with physician burnout. ChatGPT is a language model that has shown the ability to generate near-human level text responses. This study evaluated the quality of ChatGPT responses to real-world urology patient messages.

Methods: One hundred electronic patient messages were collected from a practicing urologist's inbox and categorized based on the question content. Individual responses were generated by entering each message into ChatGPT. The questions and responses were independently evaluated by 5 urologists and graded on a 5-point Likert scale. Questions were graded based on difficulty, and responses were graded based on accuracy, completeness, harmfulness, helpfulness, and intelligibleness. Whether or not the response could be sent to a patient was also assessed.

Results: Overall, 47% of responses were deemed acceptable to send to patients. ChatGPT performed better on easy questions with 56% of responses to easy questions being acceptable to send as compared to 34% of difficult questions (P = .03). Responses to easy questions were more accurate, complete, helpful, and intelligible than responses to difficult questions. There was no difference in response quality based on question content.

Conclusions: ChatGPT generated acceptable responses to nearly 50% of patient messages with better performance for easy questions compared to difficult questions. Use of ChatGPT to help respond to patient messages can help to decrease the time burden for the care team and improve wellness. Artificial intelligence performance will likely continue to improve with advances in generative artificial intelligence technology.

Keywords: artificial intelligence; electronic medical record; quality improvement; urology.

PubMed Disclaimer

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Assessing Artificial Intelligence-Generated Responses to Urology Patient In-Basket Messages

Affiliations

Assessing Artificial Intelligence-Generated Responses to Urology Patient In-Basket Messages

Authors

Affiliations

Abstract

MeSH terms

LinkOut - more resources

Full Text Sources