Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 20.
doi: 10.14309/ajg.0000000000003552. Online ahead of print.

Artificial Intelligence in Gastroenterology Education: DeepSeek Passes the Gastroenterology Board Examination and Outperforms Legacy ChatGPT Models

Affiliations

Artificial Intelligence in Gastroenterology Education: DeepSeek Passes the Gastroenterology Board Examination and Outperforms Legacy ChatGPT Models

Andrew F Ibrahim et al. Am J Gastroenterol. .

Abstract

Introduction: DeepSeek was evaluated in gastroenterology board examination performance against legacy ChatGPT offline models, which previously showed failing performance.

Methods: The performances of the DeepSeek base R1 model and search-augmented R1 model were assessed using American College of Gastroenterology self-assessments (455 questions).

Results: DeepSeek exceeded the passing threshold. Search-augmented DeepSeek scored 81.5% across all questions, and the R1 base model scored 77.1%. Both search-augmented and offline DeepSeek models surpassed offline ChatGPT-3 (65.1%) and ChatGPT-4 (62.4%) ( P < 0.001).

Discussion: DeepSeek exhibited passing performance on the gastroenterology board examination but had gaps in niche topics and image exclusion limit utility. It may supplement education if validated by specialists.

Keywords: artificial intelligence; large language models; medical education.

PubMed Disclaimer

References

    1. Topol EJ. High-performance medicine: The convergence of human and artificial intelligence. Nat Med 2019;25(1):44–56.
    1. Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health 2023;2(2):e0000198.
    1. Gilson A, Safranek CW, Huang T, et al. How does ChatGPT perform on the United States medical Licensing Examination (USMLE)? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ 2023;9:e45312.
    1. Sarwar S, Dent A, Faust K, et al. Physician perspectives on integration of artificial intelligence into diagnostic pathology. npj Digital Med 2019;2(1):28.
    1. Suchman K, Garg S, Trindade AJ. Chat generative pretrained transformer fails the multiple-choice American College of Gastroenterology Self-Assessment Test. Am J Gastroenterol 2023;118(12):2280–2.

LinkOut - more resources