Artificial Intelligence in Gastroenterology Education: DeepSeek Passes the Gastroenterology Board Examination and Outperforms Legacy ChatGPT Models
- PMID: 40392256
- DOI: 10.14309/ajg.0000000000003552
Artificial Intelligence in Gastroenterology Education: DeepSeek Passes the Gastroenterology Board Examination and Outperforms Legacy ChatGPT Models
Abstract
Introduction: DeepSeek was evaluated in gastroenterology board examination performance against legacy ChatGPT offline models, which previously showed failing performance.
Methods: The performances of the DeepSeek base R1 model and search-augmented R1 model were assessed using American College of Gastroenterology self-assessments (455 questions).
Results: DeepSeek exceeded the passing threshold. Search-augmented DeepSeek scored 81.5% across all questions, and the R1 base model scored 77.1%. Both search-augmented and offline DeepSeek models surpassed offline ChatGPT-3 (65.1%) and ChatGPT-4 (62.4%) ( P < 0.001).
Discussion: DeepSeek exhibited passing performance on the gastroenterology board examination but had gaps in niche topics and image exclusion limit utility. It may supplement education if validated by specialists.
Keywords: artificial intelligence; large language models; medical education.
Copyright © 2025 by The American College of Gastroenterology.
References
-
- Topol EJ. High-performance medicine: The convergence of human and artificial intelligence. Nat Med 2019;25(1):44–56.
-
- Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health 2023;2(2):e0000198.
-
- Gilson A, Safranek CW, Huang T, et al. How does ChatGPT perform on the United States medical Licensing Examination (USMLE)? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ 2023;9:e45312.
-
- Sarwar S, Dent A, Faust K, et al. Physician perspectives on integration of artificial intelligence into diagnostic pathology. npj Digital Med 2019;2(1):28.
-
- Suchman K, Garg S, Trindade AJ. Chat generative pretrained transformer fails the multiple-choice American College of Gastroenterology Self-Assessment Test. Am J Gastroenterol 2023;118(12):2280–2.
LinkOut - more resources
Full Text Sources