Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep 30;15(1):34021.
doi: 10.1038/s41598-025-13077-1.

Assessment of the validity of ChatGPT-3.5 responses to patient-generated queries following BPH surgery

Affiliations

Assessment of the validity of ChatGPT-3.5 responses to patient-generated queries following BPH surgery

Jad Najdi et al. Sci Rep. .

Abstract

The rapid advancement of artificial intelligence, particularly large language models like ChatGPT-3.5, presents promising applications in healthcare. This study evaluates ChatGPT-3.5's validity in responding to post-operative patient inquiries following surgery for benign prostatic hyperplasia (BPH). Common patient-generated questions were sourced from discharge instructions, online forums, and social media, covering various BPH surgical modalities. ChatGPT-3.5 responses were assessed by two senior urology residents using pre-defined criteria, with discrepancies resolved by a third reviewer. A total of 496 responses were reviewed, with 280 excluded. Among the 216 graded responses, 78.2% were comprehensive and correct, 9.3% were incomplete or partially correct, 10.2% contained a mix of accurate and inaccurate information, and 2.3% were entirely incorrect. Newer procedures (Aquablation, Rezum, iTIND) had a higher percentage of correct answers compared to traditional techniques (TURP, simple prostatectomy). The most common errors involved missing context or incorrect details (36.6%). These findings suggest that ChatGPT-3.5 has potential in providing accurate post-operative guidance for BPH patients. However, concerns regarding incomplete and misleading responses highlight the need for further refinement to improve AI-generated medical advice and ensure patient safety. Future research should focus on enhancing AI reliability in clinical applications.

Keywords: Artificial intelligence; Minimally invasive surgical procedures; Natural language processing; Postoperative care; Prostatic hyperplasia.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Visual map of the steps taken to obtain and evaluate ChatGPT’s responses to the chosen questions on post-operative instructions after BPH surgery.
Fig. 2
Fig. 2
Percentage of answers in the four different grading categories across all procedure types.
Fig. 3
Fig. 3
Percentage of answers in the four different grading categories divided by procedure type.

References

    1. Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. Improving language understanding by generative pre-training. (2018).
    1. Wu, J., Ma, Y., Wang, J. & Xiao, M. The application of ChatGPT in medicine: A scoping review and bibliometric analysis. J. Multidiscip Healthc.17, 1681–1692 (2024). - PMC - PubMed
    1. Lee, H. The rise of chatgpt: exploring its potential in medical education. Anat. Sci. Educ.17 (5), 926–931 (2024). - PubMed
    1. Sallam, M. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthc. (Basel) ;11(6). (2023). - PMC - PubMed
    1. Zhang, A. et al. ChatGPT for improving postoperative instructions in multiple fields of plastic surgery. J. Plast. Reconstr. Aesthet. Surg.99, 201–208 (2024). - PubMed

LinkOut - more resources