Battle of the artificial intelligence: a comprehensive comparative analysis of DeepSeek and ChatGPT for urinary incontinence-related questions
- PMID: 40771241
- PMCID: PMC12325333
- DOI: 10.3389/fpubh.2025.1605908
Battle of the artificial intelligence: a comprehensive comparative analysis of DeepSeek and ChatGPT for urinary incontinence-related questions
Abstract
Background: With the rapid advancement and widespread adoption of artificial intelligence (AI), patients increasingly turn to AI for initial medical guidance. Therefore, a comprehensive evaluation of AI-generated responses is warranted. This study aimed to compare the performance of DeepSeek and ChatGPT in answering urinary incontinence-related questions and to delineate their respective strengths and limitations.
Methods: Based on the American Urological Association/Society of Urodynamics, Female Pelvic Medicine & Urogenital Reconstruction (AUA/SUFU) and European Association of Urology (EAU) guidelines, we designed 25 urinary incontinence-related questions. Responses from DeepSeek and ChatGPT-4.0 were evaluated for reliability, quality, and readability. Fleiss' kappa was employed to calculate inter-rater reliability. For clinical case scenarios, we additionally assessed the appropriateness of responses. A comprehensive comparative analysis was performed.
Results: The modified DISCERN (mDISCERN) scores for DeepSeek and ChatGPT-4.0 were 28.24 ± 0.88 and 28.76 ± 1.56, respectively, showing no practically meaningful difference [P = 0.188, Cohen's d = 0.41 (95% CI: -0.15, 0.97)]. Both AI chatbots rarely provided source references. In terms of quality, DeepSeek achieved a higher mean Global Quality Scale (GQS) score than ChatGPT-4.0 (4.76 ± 0.52 vs. 4.32 ± 0.69, P = 0.001). DeepSeek also demonstrated superior readability, as indicated by a higher Flesch Reading Ease (FRE) score (76.43 ± 10.90 vs. 70.95 ± 11.16, P = 0.039) and a lower Simple Measure of Gobbledygook (SMOG) index (12.26 ± 1.39 vs. 14.21 ± 1.88, P < 0.001), suggesting easier comprehension. Regarding guideline adherence, DeepSeek had 11 (73.33%) fully compliant responses, while ChatGPT-4.0 had 13 (86.67%), with no significant difference [P = 0.651, Cohen's w = 0.083 (95% CI: 0.021, 0.232)].
Conclusion: DeepSeek and ChatGPT-4.0 might exhibit comparable reliability in answering urinary incontinence-related questions, though both lacked sufficient references. However, DeepSeek outperformed ChatGPT-4.0 in response quality and readability. While both AI chatbots largely adhered to clinical guidelines, occasional deviations were observed. Further refinements are necessary before the widespread clinical implementation of AI chatbots in urology.
Keywords: ChatGPT; DeepSeek; artificial intelligence; comparative analysis; urinary incontinence.
Copyright © 2025 Cao, Hao, Zhang, Zheng, Gao, Wu, Gan, Liu, Zeng and Wang.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Figures


Similar articles
-
Performance of ChatGPT and DeepSeek in the Management of Postprostatectomy Uri-nary Incontinence.Int Braz J Urol. 2025 Nov-Dec;51(6):e20250325. doi: 10.1590/S1677-5538.IBJU.2025.0325. Int Braz J Urol. 2025. PMID: 40857549
-
Evaluating DeepResearch and DeepThink in anterior cruciate ligament surgery patient education: ChatGPT-4o excels in comprehensiveness, DeepSeek R1 leads in clarity and readability of orthopaedic information.Knee Surg Sports Traumatol Arthrosc. 2025 Aug;33(8):3025-3031. doi: 10.1002/ksa.12711. Epub 2025 Jun 1. Knee Surg Sports Traumatol Arthrosc. 2025. PMID: 40450565 Free PMC article.
-
Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis.PLoS One. 2025 Jun 18;20(6):e0326351. doi: 10.1371/journal.pone.0326351. eCollection 2025. PLoS One. 2025. PMID: 40531978 Free PMC article.
-
Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations.Surg Obes Relat Dis. 2024 Jul;20(7):603-608. doi: 10.1016/j.soard.2024.03.011. Epub 2024 Mar 24. Surg Obes Relat Dis. 2024. PMID: 38644078 Review.
-
Pelvic floor muscle training for prevention and treatment of urinary and faecal incontinence in antenatal and postnatal women.Cochrane Database Syst Rev. 2017 Dec 22;12(12):CD007471. doi: 10.1002/14651858.CD007471.pub3. Cochrane Database Syst Rev. 2017. Update in: Cochrane Database Syst Rev. 2020 May 6;5:CD007471. doi: 10.1002/14651858.CD007471.pub4. PMID: 29271473 Free PMC article. Updated.
References
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
Miscellaneous