Still Using Only ChatGPT? The Comparison of Five Different Artificial Intelligence Chatbots' Answers to the Most Common Questions About Kidney Stones

Mehmet Fatih Şahin¹, Erdem Can Topkaç¹, Çağrı Doğan¹, Serkan Şeramet¹, Rıdvan Özcan², Murat Akgül³, Cenk Murat Yazıcı¹

Affiliations

¹ Faculty of Medicine Department of Urology, Tekirdağ Namık Kemal University, Tekirdag, Turkey.
² Department of Urology, Bursa State Hospital, Nilufer, Turkey.
³ Department of Urology, Ümraniye Research and Training Hospital, Istanbul, Turkey.

PMID: 39212674
DOI: 10.1089/end.2024.0474

Comparative Study

Still Using Only ChatGPT? The Comparison of Five Different Artificial Intelligence Chatbots' Answers to the Most Common Questions About Kidney Stones

Mehmet Fatih Şahin et al. J Endourol. 2024 Nov.

. 2024 Nov;38(11):1172-1177.

doi: 10.1089/end.2024.0474. Epub 2024 Sep 6.

Authors

Mehmet Fatih Şahin¹, Erdem Can Topkaç¹, Çağrı Doğan¹, Serkan Şeramet¹, Rıdvan Özcan², Murat Akgül³, Cenk Murat Yazıcı¹

Affiliations

¹ Faculty of Medicine Department of Urology, Tekirdağ Namık Kemal University, Tekirdag, Turkey.
² Department of Urology, Bursa State Hospital, Nilufer, Turkey.
³ Department of Urology, Ümraniye Research and Training Hospital, Istanbul, Turkey.

PMID: 39212674
DOI: 10.1089/end.2024.0474

Abstract

Objective: To evaluate and compare the quality and comprehensibility of answers produced by five distinct artificial intelligence (AI) chatbots-GPT-4, Claude, Mistral, Google PaLM, and Grok-in response to the most frequently searched questions about kidney stones (KS). Materials and Methods: Google Trends facilitated the identification of pertinent terms related to KS. Each AI chatbot was provided with a unique sequence of 25 commonly searched phrases as input. The responses were assessed using DISCERN, the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P), the Flesch-Kincaid Grade Level (FKGL), and the Flesch-Kincaid Reading Ease (FKRE) criteria. Results: The three most frequently searched terms were "stone in kidney," "kidney stone pain," and "kidney pain." Nepal, India, and Trinidad and Tobago were the countries that performed the most searches in KS. None of the AI chatbots attained the requisite level of comprehensibility. Grok demonstrated the highest FKRE (55.6 ± 7.1) and lowest FKGL (10.0 ± 1.1) ratings (p = 0.001), whereas Claude outperformed the other chatbots in its DISCERN scores (47.6 ± 1.2) (p = 0.001). PEMAT-P understandability was the lowest in GPT-4 (53.2 ± 2.0), and actionability was the highest in Claude (61.8 ± 3.5) (p = 0.001). Conclusion: GPT-4 had the most complex language structure of the five chatbots, making it the most difficult to read and comprehend, whereas Grok was the simplest. Claude had the best KS text quality. Chatbot technology can improve healthcare material and make it easier to grasp.

Keywords: Claude; GPT-4; Google PaLM; Grok; Mistral; artificial intelligence; kidney stone.

PubMed Disclaimer

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Atypon

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Still Using Only ChatGPT? The Comparison of Five Different Artificial Intelligence Chatbots' Answers to the Most Common Questions About Kidney Stones

Affiliations

Still Using Only ChatGPT? The Comparison of Five Different Artificial Intelligence Chatbots' Answers to the Most Common Questions About Kidney Stones

Authors

Affiliations

Abstract

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources