Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom's Taxonomy
- PMID: 38751805
- PMCID: PMC11094742
- DOI: 10.2147/AMEP.S457408
Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom's Taxonomy
Abstract
Introduction: This research investigated the capabilities of ChatGPT-4 compared to medical students in answering MCQs using the revised Bloom's Taxonomy as a benchmark.
Methods: A cross-sectional study was conducted at The University of the West Indies, Barbados. ChatGPT-4 and medical students were assessed on MCQs from various medical courses using computer-based testing.
Results: The study included 304 MCQs. Students demonstrated good knowledge, with 78% correctly answering at least 90% of the questions. However, ChatGPT-4 achieved a higher overall score (73.7%) compared to students (66.7%). Course type significantly affected ChatGPT-4's performance, but revised Bloom's Taxonomy levels did not. A detailed association check between program levels and Bloom's taxonomy levels for correct answers by ChatGPT-4 showed a highly significant correlation (p<0.001), reflecting a concentration of "remember-level" questions in preclinical and "evaluate-level" questions in clinical courses.
Discussion: The study highlights ChatGPT-4's proficiency in standardized tests but indicates limitations in clinical reasoning and practical skills. This performance discrepancy suggests that the effectiveness of artificial intelligence (AI) varies based on course content.
Conclusion: While ChatGPT-4 shows promise as an educational tool, its role should be supplementary, with strategic integration into medical education to leverage its strengths and address limitations. Further research is needed to explore AI's impact on medical education and student performance across educational levels and courses.
Keywords: ChatGPT-4’s; artificial intelligence; interpretation abilities; knowledge; medical students; multiple choice questions.
© 2024 Bharatha et al.
Conflict of interest statement
Dr. Md Anwarul Azim Majumder is the Editor-in-Chief of Advances in Medical Education and Practice. The other authors report no conflicts of interest in this work.
Similar articles
-
Comparing the performance of artificial intelligence learning models to medical students in solving histology and embryology multiple choice questions.Ann Anat. 2024 Jun;254:152261. doi: 10.1016/j.aanat.2024.152261. Epub 2024 Mar 21. Ann Anat. 2024. PMID: 38521363
-
Climbing Bloom's taxonomy pyramid: Lessons from a graduate histology course.Anat Sci Educ. 2017 Sep;10(5):456-464. doi: 10.1002/ase.1685. Epub 2017 Feb 23. Anat Sci Educ. 2017. PMID: 28231408
-
Anatomy exam model for the circulatory and respiratory systems using GPT-4: a medical school study.Surg Radiol Anat. 2025 Jun 10;47(1):158. doi: 10.1007/s00276-025-03667-z. Surg Radiol Anat. 2025. PMID: 40495075
-
ChatGPT prompts for generating multiple-choice questions in medical education and evidence on their validity: a literature review.Postgrad Med J. 2024 Oct 18;100(1189):858-865. doi: 10.1093/postmj/qgae065. Postgrad Med J. 2024. PMID: 38840505 Review.
-
Enhancing Clinical Reasoning with Virtual Patients: A Hybrid Systematic Review Combining Human Reviewers and ChatGPT.Healthcare (Basel). 2024 Nov 11;12(22):2241. doi: 10.3390/healthcare12222241. Healthcare (Basel). 2024. PMID: 39595439 Free PMC article. Review.
Cited by
-
Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486. J Med Internet Res. 2025. PMID: 40305085 Free PMC article.
-
The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses.BMC Res Notes. 2024 Sep 3;17(1):247. doi: 10.1186/s13104-024-06920-7. BMC Res Notes. 2024. PMID: 39228001 Free PMC article.
-
Large Language Models in Biochemistry Education: Comparative Evaluation of Performance.JMIR Med Educ. 2025 Apr 10;11:e67244. doi: 10.2196/67244. JMIR Med Educ. 2025. PMID: 40209205 Free PMC article.
-
Evaluating AI-generated examination papers in periodontology: a comparative study with human-designed counterparts.BMC Med Educ. 2025 Jul 23;25(1):1099. doi: 10.1186/s12909-025-07706-6. BMC Med Educ. 2025. PMID: 40702472 Free PMC article. Clinical Trial.
-
Attitudes and perceptions of Thai medical students regarding artificial intelligence in radiology and medicine.BMC Med Educ. 2024 Oct 22;24(1):1188. doi: 10.1186/s12909-024-06150-2. BMC Med Educ. 2024. PMID: 39438874 Free PMC article.
References
-
- Chen X, Xie H, Zou D, Hwang G-J. Application and theory gaps during the rise of artificial intelligence in education. Artl Intel. 2020;1:100002.
LinkOut - more resources
Full Text Sources