GPT-4 generated answer rationales to multiple choice assessment questions in undergraduate medical education
- PMID: 40038669
- PMCID: PMC11877964
- DOI: 10.1186/s12909-025-06862-z
GPT-4 generated answer rationales to multiple choice assessment questions in undergraduate medical education
Abstract
Background: Pre-clerkship medical students benefit from practice questions that provide rationales for answer choices. Creating these rationales is a time-intensive endeavor. Therefore, not all practice multiple choice questions (MCQ) have corresponding explanations to aid learning. The authors examined artificial intelligence's (AI) potential to create high-quality answer rationales for clinical vignette-style MCQs.
Methods: The authors conducted a single-center pre-post intervention survey study in August 2023 assessing 8 pre-clerkship course director (CD) attitudes towards GPT-4 generated answer rationales to clinical vignette style MCQs. Ten MCQs from each course's question bank were selected and input into GPT-4 with instructions to select the best answer and generate rationales for each answer choice. CDs were provided their unmodified GPT-4 interactions to assess the accuracy, clarity, appropriateness, and likelihood of implementation of the rationales. CDs were asked about time spent reviewing and making necessary modifications, satisfaction, and receptiveness in using GPT-4 for this purpose.
Results: GPT-4 correctly answered 75/80 (93.8%) questions on the first attempt. CDs were receptive to using GPT-4 for rationale generation and all were satisfied with the generated rationales. CDs determined that the majority of rationales were very accurate (77.5%), very clear (83.8%) and very appropriate (93.8%). Most rationales could be implemented with little or no modification (88.3%). All CDs would implement AI-generated answer rationales with CD editorial insights. Most CDs (75%) took ≤ 4 min to review a set of generated rationales for a question.
Conclusion: GPT-4 is an acceptable and feasible tool for generating accurate, clear and appropriate answer rationales for MCQs in medical education. Future studies should examine students' feedback to generated rationales and further explore generating rationales for question with media. The authors plan to explore the implementation of this technological application at their medical school, including logistics and training to create a streamlined process that benefits both learners and educators.
Clinical trial: Not applicable; not a clinical trial.
Keywords: Answer rationales; Artificial intelligence; ChatGPT; Clinical vignettes; GPT-4; LLM; Pre-clerkship assessments.
© 2025. The Author(s).
Conflict of interest statement
Declarations. Ethical approval: Einstein IRB determined this study to be exempt, #2023–15126. Informed consent: Informed consent was obtained from all participants, available upon request. Consent for publication: Not applicable. Disclaimers: None. Competing interests: The authors declare no competing interests.
Figures
References
-
- Levant B, Zückert W, Paolo A. Post-exam feedback with question rationales improves re-test performance of medical students on a multiple-choice exam. Adv Health Sci Educ Theory Pract. 2018;23(5):995–1003. - PubMed
-
- Wojcikowski K, Kirk L. Immediate detailed feedback to test-enhanced learning: an effective online educational tool. Med Teach. 2013;35(11):915–9. - PubMed
-
- Jeyaraju M, Linford H, Bosco Mendes T, Caufield-Noll C, Tackett S. Factors leading to successful performance on U.S. National Licensure exams for Medical students: a scoping review. Acad Med. 2023;98(1):136–48. - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources