Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug:200:105906.
doi: 10.1016/j.ijmedinf.2025.105906. Epub 2025 Apr 4.

The performance of ChatGPT and ERNIE Bot in surgical resident examinations

Affiliations

The performance of ChatGPT and ERNIE Bot in surgical resident examinations

Siyin Guo et al. Int J Med Inform. 2025 Aug.

Abstract

Study purpose: To assess the application of these two large language models (LLMs) for surgical resident examinations and to compare the performance of these LLMs with that of human residents.

Study design: In this study, 596 questions with a total of 183,556 responses were first included from the Medical Vision World, an authoritative medical education platform across China. Both Chinese prompted and non-prompted questions were input into ChatGPT-4.0 and ERNIE Bot-4.0 to compare their performance in a Chinese question database. Additionally, we screened another 210 surgical questions with detailed response results from 43 residents to compare the performance of residents and these two LLMs.

Results: There were no significant differences in the correctness of the responses to the 596 questions with or without prompts between the two LLMs (ChatGPT-4.0: 68.96 % [without prompt], 71.14 % [with prompts], p = 0.411; ERNIE Bot-4.0: 78.36 % [without prompt], 78.86 % [with prompts], p = 0.832), but ERNIE Bot-4.0 displayed higher correctness than ChatGPT-4.0 did (with prompts: p = 0.002; without prompts: p < 0.001). For another 210 questions with prompts, the two LLMs, especially ERNIE Bot-4.0 (ranking in the top 95 % of the 43 residents' scores), significantly outperformed the residents.

Conclusions: The performance of ERNIE Bot-4.0 was superior to that of ChatGPT-4.0 and that of residents on surgical resident examinations in a Chinese question database.

Keywords: Artificial intelligence; ChatGPT; ERNIE Bot; Medical examination.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Similar articles

LinkOut - more resources