Assessment of the validity of ChatGPT-3.5 responses to patient-generated queries following BPH surgery

Jad Najdi^#¹, Bilal Alameddine^#¹, Alexandre Armache¹, Marwan Zein¹, William S Azar², Albert El-Hajj^{3

4}

Affiliations

¹ Department of Surgery, Division of Urology, American University of Beirut Medical Center, Beirut, Lebanon.
² District of Columbia, Georgetown University School of Medicine, Washington, USA.
³ Department of Surgery, Division of Urology, American University of Beirut Medical Center, Beirut, Lebanon. ae67@aub.edu.lb.
⁴ American University of Beirut Medical Center Riad El-Solh, Beirut, 1107 2020, Lebanon. ae67@aub.edu.lb.

^# Contributed equally.

PMID: 41028089
PMCID: PMC12484858
DOI: 10.1038/s41598-025-13077-1

Assessment of the validity of ChatGPT-3.5 responses to patient-generated queries following BPH surgery

Jad Najdi et al. Sci Rep. 2025.

. 2025 Sep 30;15(1):34021.

doi: 10.1038/s41598-025-13077-1.

Authors

Jad Najdi^#¹, Bilal Alameddine^#¹, Alexandre Armache¹, Marwan Zein¹, William S Azar², Albert El-Hajj^{3

4}

Affiliations

¹ Department of Surgery, Division of Urology, American University of Beirut Medical Center, Beirut, Lebanon.
² District of Columbia, Georgetown University School of Medicine, Washington, USA.
³ Department of Surgery, Division of Urology, American University of Beirut Medical Center, Beirut, Lebanon. ae67@aub.edu.lb.
⁴ American University of Beirut Medical Center Riad El-Solh, Beirut, 1107 2020, Lebanon. ae67@aub.edu.lb.

^# Contributed equally.

PMID: 41028089
PMCID: PMC12484858
DOI: 10.1038/s41598-025-13077-1

Abstract

The rapid advancement of artificial intelligence, particularly large language models like ChatGPT-3.5, presents promising applications in healthcare. This study evaluates ChatGPT-3.5's validity in responding to post-operative patient inquiries following surgery for benign prostatic hyperplasia (BPH). Common patient-generated questions were sourced from discharge instructions, online forums, and social media, covering various BPH surgical modalities. ChatGPT-3.5 responses were assessed by two senior urology residents using pre-defined criteria, with discrepancies resolved by a third reviewer. A total of 496 responses were reviewed, with 280 excluded. Among the 216 graded responses, 78.2% were comprehensive and correct, 9.3% were incomplete or partially correct, 10.2% contained a mix of accurate and inaccurate information, and 2.3% were entirely incorrect. Newer procedures (Aquablation, Rezum, iTIND) had a higher percentage of correct answers compared to traditional techniques (TURP, simple prostatectomy). The most common errors involved missing context or incorrect details (36.6%). These findings suggest that ChatGPT-3.5 has potential in providing accurate post-operative guidance for BPH patients. However, concerns regarding incomplete and misleading responses highlight the need for further refinement to improve AI-generated medical advice and ensure patient safety. Future research should focus on enhancing AI reliability in clinical applications.

Keywords: Artificial intelligence; Minimally invasive surgical procedures; Natural language processing; Postoperative care; Prostatic hyperplasia.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

**Fig. 1**
Visual map of the steps taken to obtain and evaluate ChatGPT’s responses to the chosen questions on post-operative instructions after BPH surgery.

**Fig. 2**
Percentage of answers in the four different grading categories across all procedure types.

**Fig. 3**
Percentage of answers in the four different grading categories divided by procedure type.

See this image and copyright information in PMC

References

1. Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. Improving language understanding by generative pre-training. (2018).
1. Wu, J., Ma, Y., Wang, J. & Xiao, M. The application of ChatGPT in medicine: A scoping review and bibliometric analysis. J. Multidiscip Healthc.17, 1681–1692 (2024). - PMC - PubMed
1. Lee, H. The rise of chatgpt: exploring its potential in medical education. Anat. Sci. Educ.17 (5), 926–931 (2024). - PubMed
1. Sallam, M. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthc. (Basel) ;11(6). (2023). - PMC - PubMed
1. Zhang, A. et al. ChatGPT for improving postoperative instructions in multiple fields of plastic surgery. J. Plast. Reconstr. Aesthet. Surg.99, 201–208 (2024). - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Assessment of the validity of ChatGPT-3.5 responses to patient-generated queries following BPH surgery

Affiliations

Assessment of the validity of ChatGPT-3.5 responses to patient-generated queries following BPH surgery

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Medical