Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2025 Jul 23:13:1605908.
doi: 10.3389/fpubh.2025.1605908. eCollection 2025.

Battle of the artificial intelligence: a comprehensive comparative analysis of DeepSeek and ChatGPT for urinary incontinence-related questions

Affiliations
Comparative Study

Battle of the artificial intelligence: a comprehensive comparative analysis of DeepSeek and ChatGPT for urinary incontinence-related questions

Huawei Cao et al. Front Public Health. .

Abstract

Background: With the rapid advancement and widespread adoption of artificial intelligence (AI), patients increasingly turn to AI for initial medical guidance. Therefore, a comprehensive evaluation of AI-generated responses is warranted. This study aimed to compare the performance of DeepSeek and ChatGPT in answering urinary incontinence-related questions and to delineate their respective strengths and limitations.

Methods: Based on the American Urological Association/Society of Urodynamics, Female Pelvic Medicine & Urogenital Reconstruction (AUA/SUFU) and European Association of Urology (EAU) guidelines, we designed 25 urinary incontinence-related questions. Responses from DeepSeek and ChatGPT-4.0 were evaluated for reliability, quality, and readability. Fleiss' kappa was employed to calculate inter-rater reliability. For clinical case scenarios, we additionally assessed the appropriateness of responses. A comprehensive comparative analysis was performed.

Results: The modified DISCERN (mDISCERN) scores for DeepSeek and ChatGPT-4.0 were 28.24 ± 0.88 and 28.76 ± 1.56, respectively, showing no practically meaningful difference [P = 0.188, Cohen's d = 0.41 (95% CI: -0.15, 0.97)]. Both AI chatbots rarely provided source references. In terms of quality, DeepSeek achieved a higher mean Global Quality Scale (GQS) score than ChatGPT-4.0 (4.76 ± 0.52 vs. 4.32 ± 0.69, P = 0.001). DeepSeek also demonstrated superior readability, as indicated by a higher Flesch Reading Ease (FRE) score (76.43 ± 10.90 vs. 70.95 ± 11.16, P = 0.039) and a lower Simple Measure of Gobbledygook (SMOG) index (12.26 ± 1.39 vs. 14.21 ± 1.88, P < 0.001), suggesting easier comprehension. Regarding guideline adherence, DeepSeek had 11 (73.33%) fully compliant responses, while ChatGPT-4.0 had 13 (86.67%), with no significant difference [P = 0.651, Cohen's w = 0.083 (95% CI: 0.021, 0.232)].

Conclusion: DeepSeek and ChatGPT-4.0 might exhibit comparable reliability in answering urinary incontinence-related questions, though both lacked sufficient references. However, DeepSeek outperformed ChatGPT-4.0 in response quality and readability. While both AI chatbots largely adhered to clinical guidelines, occasional deviations were observed. Further refinements are necessary before the widespread clinical implementation of AI chatbots in urology.

Keywords: ChatGPT; DeepSeek; artificial intelligence; comparative analysis; urinary incontinence.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Flowchart illustrating the evaluation process of conceptual questions and clinical cases. Initially, 35 questions were created based on AUA/SUFU and EAU guidelines. Ten questions were excluded due to similarity, subjectivity, or grammatical inadequacy, leaving 25 questions for study. Responses from ChatGPT-4.0 and DeepSeek were evaluated by three professionals on reliability, quality, readability, and appropriateness in clinical cases.
Figure 1
Flow chart of question inclusion and response evaluation. AUA/SUFU, American Urological Association/Society of Urodynamics, Female Pelvic Medicine & Urogenital Reconstruction; EAU, European Association of Urology.
Bar chart comparing DeepSeek-R1 and ChatGPT-4.0 on four categories: “Cannot provide advice,” “Expression of encouragement or comfort,” “Consult with healthcare professionals,” and “Full compliance with guidelines.” DeepSeek scores are higher for “Expression of encouragement or comfort” and slightly lower for “Full compliance with guidelines” and “Consult with healthcare professionals.” Both score zero for “Cannot provide advice.” P-values are indicated beside categories; significance is noted for “Expression of encouragement or comfort” with P = 0.021.
Figure 2
Evaluation of the appropriateness of responses generated by two artificial intelligence chatbots in clinical scenarios.

Similar articles

References

    1. Aoki Y, Brown HW, Brubaker L, Cornu JN, Daly JO, Cartwright R. Urinary incontinence in women. Nat Rev Dis Primers. (2017) 3:17042. 10.1038/nrdp.2017.42 - DOI - PMC - PubMed
    1. Przydacz M, Chlosta M, Chlosta P. Population-level prevalence, bother, and treatment behavior for urinary incontinence in an eastern European country: findings from the LUTS Poland study. J Clin Med. (2021) 10:2314. 10.3390/jcm10112314 - DOI - PMC - PubMed
    1. AlQuaiz AM, Kazi A, AlYousefi N, Alwatban L, AlHabib Y, Turkistani I. Urinary incontinence affects the quality of life and increases psychological distress and low self-esteem. Healthcare. (2023) 11:1772. 10.3390/healthcare11121772 - DOI - PMC - PubMed
    1. Wang C, Li J, Wan X, Wang X, Kane RL, Wang K. Effects of stigma on Chinese women's attitudes towards seeking treatment for urinary incontinence. J Clin Nurs. (2015) 24:1112–21. 10.1111/jocn.12729 - DOI - PubMed
    1. Barakat-Johnson M, Lai M, Basjarahil S, Campbell J, Cunich M, Disher G, et al. Patients' experience of incontinence and incontinence-associated dermatitis in hospital settings: a qualitative study. J Wound Care. (2024) 33:cxcix–ccvii. 10.12968/jowc.2021.0394 - DOI - PubMed

Publication types

LinkOut - more resources