The performance of ChatGPT and ERNIE Bot in surgical resident examinations

Siyin Guo¹, Genpeng Li², Wei Du³, Fangzhi Situ⁴, Zhihui Li⁵, Jianyong Lei⁶

Affiliations

¹ Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China; The Laboratory of Thyroid and Parathyroid Disease, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China. Electronic address: guosiyin2000@163.com.
² Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China; The Laboratory of Thyroid and Parathyroid Disease, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China. Electronic address: ligenpeng@wchscu.cn.
³ Beijing Medical Vision Times Technology Development Company Limited, Beijing, China. Electronic address: 504841949@qq.com.
⁴ Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China. Electronic address: 312616385@qq.com.
⁵ Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China; The Laboratory of Thyroid and Parathyroid Disease, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China. Electronic address: rockoliver@vip.sina.com.
⁶ Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China; The Laboratory of Thyroid and Parathyroid Disease, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China. Electronic address: leijianyong@scu.edu.cn.

PMID: 40220627
DOI: 10.1016/j.ijmedinf.2025.105906

The performance of ChatGPT and ERNIE Bot in surgical resident examinations

Siyin Guo et al. Int J Med Inform. 2025 Aug.

. 2025 Aug:200:105906.

doi: 10.1016/j.ijmedinf.2025.105906. Epub 2025 Apr 4.

Authors

Siyin Guo¹, Genpeng Li², Wei Du³, Fangzhi Situ⁴, Zhihui Li⁵, Jianyong Lei⁶

Affiliations

¹ Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China; The Laboratory of Thyroid and Parathyroid Disease, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China. Electronic address: guosiyin2000@163.com.
² Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China; The Laboratory of Thyroid and Parathyroid Disease, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China. Electronic address: ligenpeng@wchscu.cn.
³ Beijing Medical Vision Times Technology Development Company Limited, Beijing, China. Electronic address: 504841949@qq.com.
⁴ Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China. Electronic address: 312616385@qq.com.
⁵ Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China; The Laboratory of Thyroid and Parathyroid Disease, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China. Electronic address: rockoliver@vip.sina.com.
⁶ Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China; The Laboratory of Thyroid and Parathyroid Disease, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China. Electronic address: leijianyong@scu.edu.cn.

PMID: 40220627
DOI: 10.1016/j.ijmedinf.2025.105906

Abstract

Study purpose: To assess the application of these two large language models (LLMs) for surgical resident examinations and to compare the performance of these LLMs with that of human residents.

Study design: In this study, 596 questions with a total of 183,556 responses were first included from the Medical Vision World, an authoritative medical education platform across China. Both Chinese prompted and non-prompted questions were input into ChatGPT-4.0 and ERNIE Bot-4.0 to compare their performance in a Chinese question database. Additionally, we screened another 210 surgical questions with detailed response results from 43 residents to compare the performance of residents and these two LLMs.

Results: There were no significant differences in the correctness of the responses to the 596 questions with or without prompts between the two LLMs (ChatGPT-4.0: 68.96 % [without prompt], 71.14 % [with prompts], p = 0.411; ERNIE Bot-4.0: 78.36 % [without prompt], 78.86 % [with prompts], p = 0.832), but ERNIE Bot-4.0 displayed higher correctness than ChatGPT-4.0 did (with prompts: p = 0.002; without prompts: p < 0.001). For another 210 questions with prompts, the two LLMs, especially ERNIE Bot-4.0 (ranking in the top 95 % of the 43 residents' scores), significantly outperformed the residents.

Conclusions: The performance of ERNIE Bot-4.0 was superior to that of ChatGPT-4.0 and that of residents on surgical resident examinations in a Chinese question database.

Keywords: Artificial intelligence; ChatGPT; ERNIE Bot; Medical examination.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- ClinicalKey
- Elsevier Science

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The performance of ChatGPT and ERNIE Bot in surgical resident examinations

Affiliations

The performance of ChatGPT and ERNIE Bot in surgical resident examinations

Authors

Affiliations

Abstract

Conflict of interest statement

Similar articles

MeSH terms

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Similar articles

MeSH terms

Related information

LinkOut - more resources

Full Text Sources