Comparison of performance of artificial intelligence tools in answering emergency medicine question pool: ChatGPT 4.0, Google Gemini and Microsoft Copilot

Iskender Aksoy¹, Merve Kara Arslan²

Affiliations

¹ Iskender Aksoy Department of Emergency Medicine, Faculty of Medicine, Giresun University, 28100, Giresun, Turkey.
² Merve Kara Arslan Department of Emergency Clinic, Bulancak State Hospital, 28300, Bulancak, Giresun, Turkey.

PMID: 40290213
PMCID: PMC12022595
DOI: 10.12669/pjms.41.4.11178

Comparison of performance of artificial intelligence tools in answering emergency medicine question pool: ChatGPT 4.0, Google Gemini and Microsoft Copilot

Iskender Aksoy et al. Pak J Med Sci. 2025 Apr.

. 2025 Apr;41(4):968-972.

doi: 10.12669/pjms.41.4.11178.

Authors

Iskender Aksoy¹, Merve Kara Arslan²

Affiliations

¹ Iskender Aksoy Department of Emergency Medicine, Faculty of Medicine, Giresun University, 28100, Giresun, Turkey.
² Merve Kara Arslan Department of Emergency Clinic, Bulancak State Hospital, 28300, Bulancak, Giresun, Turkey.

PMID: 40290213
PMCID: PMC12022595
DOI: 10.12669/pjms.41.4.11178

Abstract

Objective: Using artificial intelligence tools that work with different software architectures for both clinical and educational purposes in the medical field has been a subject of considerable interest recently. In this study, we compared the answers given by three different artificial intelligence chatbots to the Emergency Medicine question pool obtained from the questions asked in the Turkish National Medical Specialization Exam. We tried to investigate the effects on the answers given by classifying the questions in terms of content and form and examining the question sentences.

Methods: The questions related to emergency medicine of the Medical Specialization Exam questions between 2015-2020 were recorded. The questions were asked to artificial intelligence models, including ChatGPT-4, Gemini, and Copilot. The length of the questions, the question type and the topics of the wrong answers were recorded.

Results: The most successful chatbot in terms of total score was Microsoft Copilot (7.8% error margin), while the least successful was Google Gemini (22.9% error margin) (p<0.001). It was important that all chatbots had the highest error margins in questions about trauma and surgical approaches and made mistakes in burns and pediatrics. The increase in the error rates in questions containing the root "probability" also showed that the question style affected the answers given.

Conclusions: Although chatbots show promising success in determining the correct answer, we think that they should not see chatbots as a primary source for the exam, but rather as a good auxiliary tool to support their learning processes.

Keywords: Artificial Intelligence; ChatGPT; Copilot; Emergency medicine; Gemini; Medical education.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest: None.

References

1. Imran N, Jawaid M. Artificial intelligence in medical education:Are we ready for it? Pak J Med Sci. 2020;36(5):857–859. doi:10.12669/pjms.36.5.3042. - PMC - PubMed
1. Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, et al. How does ChatGPT perform on the United States Medical Licensing Examination (USMLE)?The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9(1):e45312. doi:10.2196/45312. - PMC - PubMed
1. Waisberg E, Ong J, Masalkhi M, Kamran SA, Zaman N, Sarker P, et al. GPT-4:A new era of artificial intelligence in medicine. Ir J Med Sci. 2023;192(6):3197–3200. doi:10.1007/s11845-023-03377-8. - PubMed
1. Rossettini G, Rodeghiero L, Corradi F, Cook C, Pillastrini P, Turolla A, et al. Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees:a cross-sectional study. BMC Med Educ. 2024;24(1):694. doi:10.1186/s12909-024-05630-9. - PMC - PubMed
1. Chenais G, Lagarde E, Gil-Jardiné C. Artificial intelligence in emergency medicine:viewpoint of current applications and foreseeable opportunities and challenges. J Med Internet Res. 2023;25:e40031. doi:10.2196/40031. - PMC - PubMed

LinkOut - more resources

Full Text Sources
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Comparison of performance of artificial intelligence tools in answering emergency medicine question pool: ChatGPT 4.0, Google Gemini and Microsoft Copilot

Affiliations

Comparison of performance of artificial intelligence tools in answering emergency medicine question pool: ChatGPT 4.0, Google Gemini and Microsoft Copilot

Authors

Affiliations

Abstract

Conflict of interest statement

References

LinkOut - more resources

Full Text Sources