Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 22;17(6):e86537.
doi: 10.7759/cureus.86537. eCollection 2025 Jun.

How Well Do Different AI Language Models Inform Patients About Radiofrequency Ablation for Varicose Veins?

Affiliations

How Well Do Different AI Language Models Inform Patients About Radiofrequency Ablation for Varicose Veins?

Ayman Zyada et al. Cureus. .

Abstract

Introduction The rapid integration of artificial intelligence (AI) into healthcare has led to increased public use of large language models (LLMs) to obtain medical information. However, the accuracy and clarity of AI-generated responses to patient queries remain uncertain. This study aims to evaluate and compare the quality of responses provided by five leading AI language models regarding radiofrequency ablation (RFA) for varicose veins. Objective To assess and compare the reliability, clarity, and usefulness of AI-generated answers to frequently asked patient questions about RFA for varicose veins, as evaluated by expert vascular surgeons. Methods A blinded, comparative observational study was conducted using a standardized list of eight frequently asked questions about RFA, derived from reputable vascular surgery centers across multiple countries. Five top-performing, open-access LLMs (ChatGPT-4, OpenAI, San Francisco, CA, USA; DeepSeek-R1, DeepSeek, Hangzhou, Zhejiang, China; Gemini 2.0, Google DeepMind, Mountain View, CA, USA; Grok-3, xAI, San Francisco, CA, USA; and LLaMA 3.1, Meta Platforms, Inc., Menlo Park, CA, USA) were tested. Responses from each model were independently evaluated by 32 experienced vascular surgeons using four criteria: accuracy, clarity, relevance, and depth. Statistical analyses, including Friedman and Wilcoxon signed-rank tests, were used to determine model performance. Results Grok-3 was rated as providing the highest-quality responses in 51.6% of instances, significantly outperforming all other models (p < 0.0001). ChatGPT-4 ranked second with 23.1%. Gemini, DeepSeek, and LLaMA showed comparable but lower performance. Question-specific analysis revealed that Grok-3 dominated responses related to procedural risks and post-procedure care, while ChatGPT-4 performed best in introductory questions. A subgroup analysis showed that user experience level had no significant impact on model preferences. While 42.4% of respondents were willing to recommend AI tools to patients, 45.5% remained uncertain, reflecting ongoing hesitation. Conclusion Grok-3 and ChatGPT-4 currently provide the most reliable AI-generated patient education about RFA for varicose veins. While AI holds promise in improving patient understanding and reducing physician workload, ongoing evaluation and cautious clinical integration are essential. The study establishes a baseline for future comparisons as AI technologies continue to evolve.

Keywords: ai in healthcare; artificial intelligence; large language models; model evaluation; patient education; radiofrequency ablation; varicose veins.

PubMed Disclaimer

Conflict of interest statement

Human subjects: All authors have confirmed that this study did not involve human participants or tissue. Animal subjects: All authors have confirmed that this study did not involve animal subjects or tissue. Conflicts of interest: In compliance with the ICMJE uniform disclosure form, all authors declare the following: Payment/services info: All authors have declared that no financial support was received from any organization for the submitted work. Financial relationships: All authors have declared that they have no financial relationships at present or within the previous three years with any organizations that might have an interest in the submitted work. Other relationships: All authors have declared that there are no other relationships or activities that could appear to have influenced the submitted work.

Similar articles

References

    1. AI in healthcare. Mastud M. https://doi.org/10.32628/ijsrst25121152 Int J Sci Res Sci Technol. 2025;12:34–36.
    1. Can artificial intelligence improve patient educational material readability? A systematic review and narrative synthesis. Nasra M, Jaffri R, Pavlin-Premrl D, et al. https://doi.org/10.1111/imj.16607. Intern Med J. 2025;55:20–34. - PubMed
    1. Enhancing interpretability and accuracy of AI models in healthcare: a comprehensive review on challenges and future directions. Ennab M, Mcheick H. https://doi.org/10.3389/frobt.2024.1444763. Front Robot AI. 2024;11:1444763. - PMC - PubMed
    1. Assessing the quality of ChatGPT's responses to questions related to radiofrequency ablation for varicose veins. Anees M, Shaikh FA, Shaikh H, Siddiqui NA, Rehman ZU. J Vasc Surg Venous Lymphat Disord. 2025;13:101985. - PMC - PubMed
    1. A review of familial, genetic, and congenital aspects of primary varicose vein disease. Anwar MA, Georgiadis KA, Shalhoub J, Lim CS, Gohel MS, Davies AH. https://doi.org/10.1161/CIRCGENETICS.112.963439. Circ Cardiovasc Genet. 2012;5:460–466. - PubMed

LinkOut - more resources