Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal

Ryunosuke Noda¹, Yuto Izaki², Fumiya Kitano², Jun Komatsu², Daisuke Ichikawa², Yugo Shibagaki²

Affiliations

¹ Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-Ku, Kawasaki, Kanagawa, 216-8511, Japan. nodaryu00@gmail.com.
² Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-Ku, Kawasaki, Kanagawa, 216-8511, Japan.

PMID: 38353783
DOI: 10.1007/s10157-023-02451-w

Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal

Ryunosuke Noda et al. Clin Exp Nephrol. 2024 May.

. 2024 May;28(5):465-469.

doi: 10.1007/s10157-023-02451-w. Epub 2024 Feb 14.

Authors

Ryunosuke Noda¹, Yuto Izaki², Fumiya Kitano², Jun Komatsu², Daisuke Ichikawa², Yugo Shibagaki²

Affiliations

¹ Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-Ku, Kawasaki, Kanagawa, 216-8511, Japan. nodaryu00@gmail.com.
² Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-Ku, Kawasaki, Kanagawa, 216-8511, Japan.

PMID: 38353783
DOI: 10.1007/s10157-023-02451-w

Abstract

Background: Large language models (LLMs) have impacted advances in artificial intelligence. While LLMs have demonstrated high performance in general medical examinations, their performance in specialized areas such as nephrology is unclear. This study aimed to evaluate ChatGPT and Bard in their potential nephrology applications.

Methods: Ninety-nine questions from the Self-Assessment Questions for Nephrology Board Renewal from 2018 to 2022 were presented to two versions of ChatGPT (GPT-3.5 and GPT-4) and Bard. We calculated the correct answer rates for the five years, each year, and question categories and checked whether they exceeded the pass criterion. The correct answer rates were compared with those of the nephrology residents.

Results: The overall correct answer rates for GPT-3.5, GPT-4, and Bard were 31.3% (31/99), 54.5% (54/99), and 32.3% (32/99), respectively, thus GPT-4 significantly outperformed GPT-3.5 (p < 0.01) and Bard (p < 0.01). GPT-4 passed in three years, barely meeting the minimum threshold in two. GPT-4 demonstrated significantly higher performance in problem-solving, clinical, and non-image questions than GPT-3.5 and Bard. GPT-4's performance was between third- and fourth-year nephrology residents.

Conclusions: GPT-4 outperformed GPT-3.5 and Bard and met the Nephrology Board renewal standards in specific years, albeit marginally. These results highlight LLMs' potential and limitations in nephrology. As LLMs advance, nephrologists should understand their performance for future applications.

Keywords: Artificial intelligence; ChatGPT; GPT-4; Large language models; Nephrology.

PubMed Disclaimer

References

1. Zhao WX, Zhou K, Li J et al. A survey of large language models. ArXiv e-prints, 2023 ( arXiv:2303.18223 ).
1. Gilson A, Safranek CW, Huang T, et al. How does ChatGPT perform on the united states medical licensing examination? the implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9: e45312. - DOI - PubMed - PMC
1. Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2: e0000198. - DOI - PubMed - PMC
1. Sallam M. The utility of ChatGPT as an example of large language models in healthcare education, research and practice: systematic review on the future perspectives and potential limitations. MedRxiv e-prints, 2023 (medRxiv: 2023.02.19.23286155v1).
1. Introducing ChatGPT: OpenAI. https://openai.com/blog/chatgpt/ . Published November 30, 2022. Accessed 25 May 25 2023.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Springer
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal

Affiliations

Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal

Authors

Affiliations

Abstract

References

MeSH terms

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous