The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses

Malik Sallam^{1

2}, Kholoud Al-Mahzoum³, Rawan Ahmad Almutawaa³, Jasmen Ahmad Alhashash³, Retaj Abdullah Dashti³, Danah Raed AlSafy³, Reem Abdullah Almutairi³, Muna Barakat⁴

Affiliations

¹ Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, 11942, Jordan. malik.sallam@ju.edu.jo.
² Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Queen Rania Al-Abdullah Street-Aljubeiha, P.O. Box: 13046, Amman, 11942, Jordan. malik.sallam@ju.edu.jo.
³ School of Medicine, The University of Jordan, Amman, 11942, Jordan.
⁴ Department of Clinical Pharmacy and Therapeutics, Faculty of Pharmacy, Applied Science Private University, Amman, 11931, Jordan.

PMID: 39228001
PMCID: PMC11373487
DOI: 10.1186/s13104-024-06920-7

Comparative Study

The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses

Malik Sallam et al. BMC Res Notes. 2024.

. 2024 Sep 3;17(1):247.

doi: 10.1186/s13104-024-06920-7.

Authors

Malik Sallam^{1

2}, Kholoud Al-Mahzoum³, Rawan Ahmad Almutawaa³, Jasmen Ahmad Alhashash³, Retaj Abdullah Dashti³, Danah Raed AlSafy³, Reem Abdullah Almutairi³, Muna Barakat⁴

Affiliations

¹ Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, 11942, Jordan. malik.sallam@ju.edu.jo.
² Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Queen Rania Al-Abdullah Street-Aljubeiha, P.O. Box: 13046, Amman, 11942, Jordan. malik.sallam@ju.edu.jo.
³ School of Medicine, The University of Jordan, Amman, 11942, Jordan.
⁴ Department of Clinical Pharmacy and Therapeutics, Faculty of Pharmacy, Applied Science Private University, Amman, 11931, Jordan.

PMID: 39228001
PMCID: PMC11373487
DOI: 10.1186/s13104-024-06920-7

Abstract

Objective: The integration of artificial intelligence (AI) in healthcare education is inevitable. Understanding the proficiency of generative AI in different languages to answer complex questions is crucial for educational purposes. The study objective was to compare the performance ChatGPT-4 and Gemini in answering Virology multiple-choice questions (MCQs) in English and Arabic, while assessing the quality of the generated content. Both AI models' responses to 40 Virology MCQs were assessed for correctness and quality based on the CLEAR tool designed for evaluation of AI-generated content. The MCQs were classified into lower and higher cognitive categories based on the revised Bloom's taxonomy. The study design considered the METRICS checklist for the design and reporting of generative AI-based studies in healthcare.

Results: ChatGPT-4 and Gemini performed better in English compared to Arabic, with ChatGPT-4 consistently surpassing Gemini in correctness and CLEAR scores. ChatGPT-4 led Gemini with 80% vs. 62.5% correctness in English compared to 65% vs. 55% in Arabic. For both AI models, superior performance in lower cognitive domains was reported. Both ChatGPT-4 and Gemini exhibited potential in educational applications; nevertheless, their performance varied across languages highlighting the importance of continued development to ensure the effective AI integration in healthcare education globally.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1**
The average CLEAR scores for generative AI models’ output regarding Virology multiple choice questions (MCQs). Gemini performance in Arabic vs. English (A); ChatGPT-4 performance in Arabic vs. English (B)

See this image and copyright information in PMC

References

1. UNESCO. World Arabic Language Day. 7. March 2024, 2024. Updated 18 December 2023. Accessed 7 March 2024, 2024. https://www.unesco.org/en/world-arabic-language-day
1. Alhamami M, Almelhi A. English or Arabic in Healthcare Education: perspectives of Healthcare alumni, students, and instructors. J Multidiscip Healthc. 2021;14:2537–47. 10.2147/jmdh.S330579. 10.2147/jmdh.S330579 - DOI - PMC - PubMed
1. Kaliyadan F, Thalamkandathil N, Parupalli SR, Amin TT, Balaha MH, Al Bu Ali WH. English language proficiency and academic performance: a study of a medical preparatory year program in Saudi Arabia. Avicenna J Med Oct-Dec. 2015;5(4):140–4. 10.4103/2231-0770.165126. 10.4103/2231-0770.165126 - DOI - PMC - PubMed
1. Alshareef M, Mobaireek O, Mohamud M, Alrajhi Z, Alhamdan A, Hamad B. Decision Makers’ Perspectives on the Language of Instruction in Medicine in Saudi Arabia: A Qualitative Study. Health Professions Education. 2018/12/01/ 2018;4(4):308–316. 10.1016/j.hpe.2018.03.006
1. Sabbour SM, Dewedar SA, Kandil SK. Language barriers in medical education and attitudes towards arabization of medicine: student and staff perspectives. East Mediterr Health J Dec. 2012;4(12):1263–71. 10.26719/2010.16.12.1263. 10.26719/2010.16.12.1263 - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- BioMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses

Affiliations

The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources