. 2025 Jul 30:11:e69313.

doi: 10.2196/69313.

Role of Artificial Intelligence in Surgical Training by Assessing GPT-4 and GPT-4o on the Japan Surgical Board Examination With Text-Only and Image-Accompanied Questions: Performance Evaluation Study

Hiroki Maruyama¹, Yoshitaka Toyama², Kentaro Takanami², Kei Takase³, Takashi Kamei¹

Affiliations

¹ Department of Surgery, Tohoku University Graduate School of Medicine, Sendai, Japan.
² Department of Diagnostic Radiology, Tohoku University Hospital, 1-1 Seiryo-Machi, Aoba-Ku, Sendai, Japan, Sendai, 980-8575, Japan, 81 227177312.
³ Department of Diagnostic Radiology, Tohoku University Graduate School of Medicine, Sendai, Japan.

PMID: 40737609
PMCID: PMC12310146
DOI: 10.2196/69313

Role of Artificial Intelligence in Surgical Training by Assessing GPT-4 and GPT-4o on the Japan Surgical Board Examination With Text-Only and Image-Accompanied Questions: Performance Evaluation Study

Hiroki Maruyama et al. JMIR Med Educ. 2025.

. 2025 Jul 30:11:e69313.

doi: 10.2196/69313.

Authors

Hiroki Maruyama¹, Yoshitaka Toyama², Kentaro Takanami², Kei Takase³, Takashi Kamei¹

Affiliations

¹ Department of Surgery, Tohoku University Graduate School of Medicine, Sendai, Japan.
² Department of Diagnostic Radiology, Tohoku University Hospital, 1-1 Seiryo-Machi, Aoba-Ku, Sendai, Japan, Sendai, 980-8575, Japan, 81 227177312.
³ Department of Diagnostic Radiology, Tohoku University Graduate School of Medicine, Sendai, Japan.

PMID: 40737609
PMCID: PMC12310146
DOI: 10.2196/69313

Abstract

Background: Artificial intelligence and large language models (LLMs)-particularly GPT-4 and GPT-4o-have demonstrated high correct-answer rates in medical examinations. GPT-4o has enhanced diagnostic capabilities, advanced image processing, and updated knowledge. Japanese surgeons face critical challenges, including a declining workforce, regional health care disparities, and work-hour-related challenges. Nonetheless, although LLMs could be beneficial in surgical education, no studies have yet assessed GPT-4o's surgical knowledge or its performance in the field of surgery.

Objective: This study aims to evaluate the potential of GPT-4 and GPT-4o in surgical education by using them to take the Japan Surgical Board Examination (JSBE), which includes both textual questions and medical images-such as surgical and computed tomography scans-to comprehensively assess their surgical knowledge.

Methods: We used 297 multiple-choice questions from the 2021-2023 JSBEs. The questions were in Japanese, and 104 of them included images. First, the GPT-4 and GPT-4o responses to only the textual questions were collected via OpenAI's application programming interface to evaluate their correct-answer rate. Subsequently, the correct-answer rate of their responses to questions that included images was assessed by inputting both text and images.

Results: The overall correct-answer rates of GPT-4o and GPT-4 for the text-only questions were 78% (231/297) and 55% (163/297), respectively, with GPT-4o outperforming GPT-4 by 23% (P=<.01). By contrast, there was no significant improvement in the correct-answer rate for questions that included images compared with the results for the text-only questions.

Conclusions: GPT-4o outperformed GPT-4 on the JSBE. However, the results of the LLMs were lower than those of the examinees. Despite the capabilities of LLMs, image recognition remains a challenge for them, and their clinical application requires caution owing to the potential inaccuracy of their results.

Keywords: ChatGPT; Japan Surgical Board Examination; LLM; Medical Licensing Examination; artificial intelligence; diagnostic imaging; large language models; surgical education.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1.. Collection of data from JSBE and input into GPT models. The questions were entered into an electronic booklet in Japanese. The images were saved as screenshots and input into ChatGPT-4o. JSBE: Japan Surgical Board Examination.

See this image and copyright information in PMC

References

1. Debas HT, Bass BL, Brennan MF, et al. American Surgical Association Blue Ribbon Committee Report on Surgical Education: 2004. Ann Surg. 2005 Jan;241(1):1–8. doi: 10.1097/01.sla.0000150066.83563.52. doi. Medline. - DOI - PMC - PubMed
1. Overview of statistics on doctors, dentists [Article in Japanese] Ministry of Health Labour and Welfare. 2024. [16-07-2025]. https://www.mhlw.go.jp/toukei/saikin/hw/ishi/22/index.html URL. Accessed.
1. Work style reform for doctors [Article in Japanese] Ministry of Health, Labour and Welfare. 2024. [16-07-2025]. https://www.mhlw.go.jp/content/10800000/001129457.pdf URL. Accessed.
1. Varas J, Coronel BV, Villagrán I, et al. Innovations in surgical training: exploring the role of artificial intelligence and large language models (LLM) Rev Col Bras Cir. 2023;50:e20233605. doi: 10.1590/0100-6991e-20233605-en. doi. Medline. - DOI - PMC - PubMed
1. ChatGPT. Open AI. 2024. [16-07-2025]. https://openai.com/chatgpt/ URL. Accessed.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- JMIR Publications
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Role of Artificial Intelligence in Surgical Training by Assessing GPT-4 and GPT-4o on the Japan Surgical Board Examination With Text-Only and Image-Accompanied Questions: Performance Evaluation Study

Affiliations

Role of Artificial Intelligence in Surgical Training by Assessing GPT-4 and GPT-4o on the Japan Surgical Board Examination With Text-Only and Image-Accompanied Questions: Performance Evaluation Study

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources