Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom's Taxonomy

Affiliations

¹ Faculty of Medical Sciences, The University of the West Indies, Bridgetown, Barbados.
² Department of Population Sciences, University of Dhaka, Dhaka, Bangladesh.

PMID: 38751805
PMCID: PMC11094742
DOI: 10.2147/AMEP.S457408

Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom's Taxonomy

Ambadasu Bharatha et al. Adv Med Educ Pract. 2024.

. 2024 May 10:15:393-400.

doi: 10.2147/AMEP.S457408. eCollection 2024.

Affiliations

¹ Faculty of Medical Sciences, The University of the West Indies, Bridgetown, Barbados.
² Department of Population Sciences, University of Dhaka, Dhaka, Bangladesh.

PMID: 38751805
PMCID: PMC11094742
DOI: 10.2147/AMEP.S457408

Abstract

Introduction: This research investigated the capabilities of ChatGPT-4 compared to medical students in answering MCQs using the revised Bloom's Taxonomy as a benchmark.

Methods: A cross-sectional study was conducted at The University of the West Indies, Barbados. ChatGPT-4 and medical students were assessed on MCQs from various medical courses using computer-based testing.

Results: The study included 304 MCQs. Students demonstrated good knowledge, with 78% correctly answering at least 90% of the questions. However, ChatGPT-4 achieved a higher overall score (73.7%) compared to students (66.7%). Course type significantly affected ChatGPT-4's performance, but revised Bloom's Taxonomy levels did not. A detailed association check between program levels and Bloom's taxonomy levels for correct answers by ChatGPT-4 showed a highly significant correlation (p<0.001), reflecting a concentration of "remember-level" questions in preclinical and "evaluate-level" questions in clinical courses.

Discussion: The study highlights ChatGPT-4's proficiency in standardized tests but indicates limitations in clinical reasoning and practical skills. This performance discrepancy suggests that the effectiveness of artificial intelligence (AI) varies based on course content.

Conclusion: While ChatGPT-4 shows promise as an educational tool, its role should be supplementary, with strategic integration into medical education to leverage its strengths and address limitations. Further research is needed to explore AI's impact on medical education and student performance across educational levels and courses.

Keywords: ChatGPT-4’s; artificial intelligence; interpretation abilities; knowledge; medical students; multiple choice questions.

PubMed Disclaimer

Conflict of interest statement

Dr. Md Anwarul Azim Majumder is the Editor-in-Chief of Advances in Medical Education and Practice. The other authors report no conflicts of interest in this work.

Cited by

Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.
Wang L, Li J, Zhuang B, Huang S, Fang M, Wang C, Li W, Zhang M, Gong S. Wang L, et al. J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486. J Med Internet Res. 2025. PMID: 40305085 Free PMC article.
The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses.
Sallam M, Al-Mahzoum K, Almutawaa RA, Alhashash JA, Dashti RA, AlSafy DR, Almutairi RA, Barakat M. Sallam M, et al. BMC Res Notes. 2024 Sep 3;17(1):247. doi: 10.1186/s13104-024-06920-7. BMC Res Notes. 2024. PMID: 39228001 Free PMC article.
Large Language Models in Biochemistry Education: Comparative Evaluation of Performance.
Bolgova O, Shypilova I, Mavrych V. Bolgova O, et al. JMIR Med Educ. 2025 Apr 10;11:e67244. doi: 10.2196/67244. JMIR Med Educ. 2025. PMID: 40209205 Free PMC article.
Evaluating AI-generated examination papers in periodontology: a comparative study with human-designed counterparts.
Ma X, Pan W, Yu XN. Ma X, et al. BMC Med Educ. 2025 Jul 23;25(1):1099. doi: 10.1186/s12909-025-07706-6. BMC Med Educ. 2025. PMID: 40702472 Free PMC article. Clinical Trial.
Attitudes and perceptions of Thai medical students regarding artificial intelligence in radiology and medicine.
Angkurawaranon S, Inmutto N, Bannangkoon K, Wonghan S, Kham-Ai T, Khumma P, Daengpisut K, Thabarsa P, Angkurawaranon C. Angkurawaranon S, et al. BMC Med Educ. 2024 Oct 22;24(1):1188. doi: 10.1186/s12909-024-06150-2. BMC Med Educ. 2024. PMID: 39438874 Free PMC article.

References

1. Chen X, Xie H, Zou D, Hwang G-J. Application and theory gaps during the rise of artificial intelligence in education. Artl Intel. 2020;1:100002.
1. Meo SA, Al-Masri AA, Alotaibi M, Meo MZS, Meo MOS. ChatGPT knowledge evaluation in basic and clinical medical sciences: Multiple choice question examination-based performance. Health care. 2023;11(14). doi:10.3390/healthcare11142046 - DOI - PMC - PubMed
1. Ignjatović A, Stevanović L. Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: A descriptive study. J Educ Eval Health Prof. 2023;20:28. doi:10.3352/jeehp.2023.20.28 - DOI - PMC - PubMed
1. Roos J, Kasapovic A, Jansen T, Kaczmarczyk R. Artificial Intelligence in medical education: Comparative analysis of ChatGPT, Bing, and medical students in Germany. JMIR Med Educ. 2023;9(e46482):e46482. doi:10.2196/46482 - DOI - PMC - PubMed
1. Agarwal M, Goswami A, Sharma P. Evaluating ChatGPT-3.5 and Claude-2 in answering and explaining conceptual medical physiology multiple-choice questions. Cureus. 2023;15(9):e46222. - PMC - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom's Taxonomy

Affiliations

Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom's Taxonomy

Authors

Affiliations

Abstract

Conflict of interest statement

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources