Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr;41(4):968-972.
doi: 10.12669/pjms.41.4.11178.

Comparison of performance of artificial intelligence tools in answering emergency medicine question pool: ChatGPT 4.0, Google Gemini and Microsoft Copilot

Affiliations

Comparison of performance of artificial intelligence tools in answering emergency medicine question pool: ChatGPT 4.0, Google Gemini and Microsoft Copilot

Iskender Aksoy et al. Pak J Med Sci. 2025 Apr.

Abstract

Objective: Using artificial intelligence tools that work with different software architectures for both clinical and educational purposes in the medical field has been a subject of considerable interest recently. In this study, we compared the answers given by three different artificial intelligence chatbots to the Emergency Medicine question pool obtained from the questions asked in the Turkish National Medical Specialization Exam. We tried to investigate the effects on the answers given by classifying the questions in terms of content and form and examining the question sentences.

Methods: The questions related to emergency medicine of the Medical Specialization Exam questions between 2015-2020 were recorded. The questions were asked to artificial intelligence models, including ChatGPT-4, Gemini, and Copilot. The length of the questions, the question type and the topics of the wrong answers were recorded.

Results: The most successful chatbot in terms of total score was Microsoft Copilot (7.8% error margin), while the least successful was Google Gemini (22.9% error margin) (p<0.001). It was important that all chatbots had the highest error margins in questions about trauma and surgical approaches and made mistakes in burns and pediatrics. The increase in the error rates in questions containing the root "probability" also showed that the question style affected the answers given.

Conclusions: Although chatbots show promising success in determining the correct answer, we think that they should not see chatbots as a primary source for the exam, but rather as a good auxiliary tool to support their learning processes.

Keywords: Artificial Intelligence; ChatGPT; Copilot; Emergency medicine; Gemini; Medical education.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest: None.

References

    1. Imran N, Jawaid M. Artificial intelligence in medical education:Are we ready for it? Pak J Med Sci. 2020;36(5):857–859. doi:10.12669/pjms.36.5.3042. - PMC - PubMed
    1. Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, et al. How does ChatGPT perform on the United States Medical Licensing Examination (USMLE)?The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9(1):e45312. doi:10.2196/45312. - PMC - PubMed
    1. Waisberg E, Ong J, Masalkhi M, Kamran SA, Zaman N, Sarker P, et al. GPT-4:A new era of artificial intelligence in medicine. Ir J Med Sci. 2023;192(6):3197–3200. doi:10.1007/s11845-023-03377-8. - PubMed
    1. Rossettini G, Rodeghiero L, Corradi F, Cook C, Pillastrini P, Turolla A, et al. Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees:a cross-sectional study. BMC Med Educ. 2024;24(1):694. doi:10.1186/s12909-024-05630-9. - PMC - PubMed
    1. Chenais G, Lagarde E, Gil-Jardiné C. Artificial intelligence in emergency medicine:viewpoint of current applications and foreseeable opportunities and challenges. J Med Internet Res. 2023;25:e40031. doi:10.2196/40031. - PMC - PubMed

LinkOut - more resources