How Well Do Different AI Language Models Inform Patients About Radiofrequency Ablation for Varicose Veins?
- PMID: 40698235
- PMCID: PMC12282550
- DOI: 10.7759/cureus.86537
How Well Do Different AI Language Models Inform Patients About Radiofrequency Ablation for Varicose Veins?
Abstract
Introduction The rapid integration of artificial intelligence (AI) into healthcare has led to increased public use of large language models (LLMs) to obtain medical information. However, the accuracy and clarity of AI-generated responses to patient queries remain uncertain. This study aims to evaluate and compare the quality of responses provided by five leading AI language models regarding radiofrequency ablation (RFA) for varicose veins. Objective To assess and compare the reliability, clarity, and usefulness of AI-generated answers to frequently asked patient questions about RFA for varicose veins, as evaluated by expert vascular surgeons. Methods A blinded, comparative observational study was conducted using a standardized list of eight frequently asked questions about RFA, derived from reputable vascular surgery centers across multiple countries. Five top-performing, open-access LLMs (ChatGPT-4, OpenAI, San Francisco, CA, USA; DeepSeek-R1, DeepSeek, Hangzhou, Zhejiang, China; Gemini 2.0, Google DeepMind, Mountain View, CA, USA; Grok-3, xAI, San Francisco, CA, USA; and LLaMA 3.1, Meta Platforms, Inc., Menlo Park, CA, USA) were tested. Responses from each model were independently evaluated by 32 experienced vascular surgeons using four criteria: accuracy, clarity, relevance, and depth. Statistical analyses, including Friedman and Wilcoxon signed-rank tests, were used to determine model performance. Results Grok-3 was rated as providing the highest-quality responses in 51.6% of instances, significantly outperforming all other models (p < 0.0001). ChatGPT-4 ranked second with 23.1%. Gemini, DeepSeek, and LLaMA showed comparable but lower performance. Question-specific analysis revealed that Grok-3 dominated responses related to procedural risks and post-procedure care, while ChatGPT-4 performed best in introductory questions. A subgroup analysis showed that user experience level had no significant impact on model preferences. While 42.4% of respondents were willing to recommend AI tools to patients, 45.5% remained uncertain, reflecting ongoing hesitation. Conclusion Grok-3 and ChatGPT-4 currently provide the most reliable AI-generated patient education about RFA for varicose veins. While AI holds promise in improving patient understanding and reducing physician workload, ongoing evaluation and cautious clinical integration are essential. The study establishes a baseline for future comparisons as AI technologies continue to evolve.
Keywords: ai in healthcare; artificial intelligence; large language models; model evaluation; patient education; radiofrequency ablation; varicose veins.
Copyright © 2025, Zyada et al.
Conflict of interest statement
Human subjects: All authors have confirmed that this study did not involve human participants or tissue. Animal subjects: All authors have confirmed that this study did not involve animal subjects or tissue. Conflicts of interest: In compliance with the ICMJE uniform disclosure form, all authors declare the following: Payment/services info: All authors have declared that no financial support was received from any organization for the submitted work. Financial relationships: All authors have declared that they have no financial relationships at present or within the previous three years with any organizations that might have an interest in the submitted work. Other relationships: All authors have declared that there are no other relationships or activities that could appear to have influenced the submitted work.
Similar articles
-
A multi-dimensional performance evaluation of large language models in dental implantology: comparison of ChatGPT, DeepSeek, Grok, Gemini and Qwen across diverse clinical scenarios.BMC Oral Health. 2025 Jul 28;25(1):1272. doi: 10.1186/s12903-025-06619-6. BMC Oral Health. 2025. PMID: 40721763 Free PMC article.
-
User Intent to Use DeepSeek for Health Care Purposes and Their Trust in the Large Language Model: Multinational Survey Study.JMIR Hum Factors. 2025 May 26;12:e72867. doi: 10.2196/72867. JMIR Hum Factors. 2025. PMID: 40418796 Free PMC article.
-
Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun. Cureus. 2025. PMID: 40600083 Free PMC article.
-
Endovenous ablation therapy (laser or radiofrequency) or foam sclerotherapy versus conventional surgical repair for short saphenous varicose veins.Cochrane Database Syst Rev. 2016 Nov 29;11(11):CD010878. doi: 10.1002/14651858.CD010878.pub2. Cochrane Database Syst Rev. 2016. PMID: 27898181 Free PMC article.
-
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3. Syst Rev. 2024. PMID: 39593159 Free PMC article.
References
-
- AI in healthcare. Mastud M. https://doi.org/10.32628/ijsrst25121152 Int J Sci Res Sci Technol. 2025;12:34–36.
-
- Can artificial intelligence improve patient educational material readability? A systematic review and narrative synthesis. Nasra M, Jaffri R, Pavlin-Premrl D, et al. https://doi.org/10.1111/imj.16607. Intern Med J. 2025;55:20–34. - PubMed
-
- Enhancing interpretability and accuracy of AI models in healthcare: a comprehensive review on challenges and future directions. Ennab M, Mcheick H. https://doi.org/10.3389/frobt.2024.1444763. Front Robot AI. 2024;11:1444763. - PMC - PubMed
-
- A review of familial, genetic, and congenital aspects of primary varicose vein disease. Anwar MA, Georgiadis KA, Shalhoub J, Lim CS, Gohel MS, Davies AH. https://doi.org/10.1161/CIRCGENETICS.112.963439. Circ Cardiovasc Genet. 2012;5:460–466. - PubMed
LinkOut - more resources
Full Text Sources