The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses
- PMID: 39228001
- PMCID: PMC11373487
- DOI: 10.1186/s13104-024-06920-7
The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses
Abstract
Objective: The integration of artificial intelligence (AI) in healthcare education is inevitable. Understanding the proficiency of generative AI in different languages to answer complex questions is crucial for educational purposes. The study objective was to compare the performance ChatGPT-4 and Gemini in answering Virology multiple-choice questions (MCQs) in English and Arabic, while assessing the quality of the generated content. Both AI models' responses to 40 Virology MCQs were assessed for correctness and quality based on the CLEAR tool designed for evaluation of AI-generated content. The MCQs were classified into lower and higher cognitive categories based on the revised Bloom's taxonomy. The study design considered the METRICS checklist for the design and reporting of generative AI-based studies in healthcare.
Results: ChatGPT-4 and Gemini performed better in English compared to Arabic, with ChatGPT-4 consistently surpassing Gemini in correctness and CLEAR scores. ChatGPT-4 led Gemini with 80% vs. 62.5% correctness in English compared to 65% vs. 55% in Arabic. For both AI models, superior performance in lower cognitive domains was reported. Both ChatGPT-4 and Gemini exhibited potential in educational applications; nevertheless, their performance varied across languages highlighting the importance of continued development to ensure the effective AI integration in healthcare education globally.
© 2024. The Author(s).
Conflict of interest statement
The authors declare no competing interests.
Figures
Similar articles
-
Language discrepancies in the performance of generative artificial intelligence models: an examination of infectious disease queries in English and Arabic.BMC Infect Dis. 2024 Aug 8;24(1):799. doi: 10.1186/s12879-024-09725-y. BMC Infect Dis. 2024. PMID: 39118057 Free PMC article.
-
Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge.Graefes Arch Clin Exp Ophthalmol. 2025 Feb;263(2):527-536. doi: 10.1007/s00417-024-06625-4. Epub 2024 Sep 15. Graefes Arch Clin Exp Ophthalmol. 2025. PMID: 39277830
-
Artificial intelligence in healthcare education: evaluating the accuracy of ChatGPT, Copilot, and Google Gemini in cardiovascular pharmacology.Front Med (Lausanne). 2025 Feb 19;12:1495378. doi: 10.3389/fmed.2025.1495378. eCollection 2025. Front Med (Lausanne). 2025. PMID: 40046930 Free PMC article.
-
Redefining Healthcare With Artificial Intelligence (AI): The Contributions of ChatGPT, Gemini, and Co-pilot.Cureus. 2024 Apr 7;16(4):e57795. doi: 10.7759/cureus.57795. eCollection 2024 Apr. Cureus. 2024. PMID: 38721180 Free PMC article. Review.
-
ChatGPT prompts for generating multiple-choice questions in medical education and evidence on their validity: a literature review.Postgrad Med J. 2024 Oct 18;100(1189):858-865. doi: 10.1093/postmj/qgae065. Postgrad Med J. 2024. PMID: 38840505 Review.
Cited by
-
Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486. J Med Internet Res. 2025. PMID: 40305085 Free PMC article.
-
Consistent Performance of GPT-4o in Rare Disease Diagnosis Across Nine Languages and 4967 Cases.medRxiv [Preprint]. 2025 Feb 28:2025.02.26.25322769. doi: 10.1101/2025.02.26.25322769. medRxiv. 2025. PMID: 40061308 Free PMC article. Preprint.
-
Chinese generative AI models (DeepSeek and Qwen) rival ChatGPT-4 in ophthalmology queries with excellent performance in Arabic and English.Narra J. 2025 Apr;5(1):e2371. doi: 10.52225/narra.v5i1.2371. Epub 2025 Apr 8. Narra J. 2025. PMID: 40352182 Free PMC article.
References
-
- UNESCO. World Arabic Language Day. 7. March 2024, 2024. Updated 18 December 2023. Accessed 7 March 2024, 2024. https://www.unesco.org/en/world-arabic-language-day
-
- Kaliyadan F, Thalamkandathil N, Parupalli SR, Amin TT, Balaha MH, Al Bu Ali WH. English language proficiency and academic performance: a study of a medical preparatory year program in Saudi Arabia. Avicenna J Med Oct-Dec. 2015;5(4):140–4. 10.4103/2231-0770.165126.10.4103/2231-0770.165126 - DOI - PMC - PubMed
-
- Alshareef M, Mobaireek O, Mohamud M, Alrajhi Z, Alhamdan A, Hamad B. Decision Makers’ Perspectives on the Language of Instruction in Medicine in Saudi Arabia: A Qualitative Study. Health Professions Education. 2018/12/01/ 2018;4(4):308–316. 10.1016/j.hpe.2018.03.006
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources