Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 4;25(1):333.
doi: 10.1186/s12909-025-06862-z.

GPT-4 generated answer rationales to multiple choice assessment questions in undergraduate medical education

Affiliations

GPT-4 generated answer rationales to multiple choice assessment questions in undergraduate medical education

Peter Y Ch'en et al. BMC Med Educ. .

Abstract

Background: Pre-clerkship medical students benefit from practice questions that provide rationales for answer choices. Creating these rationales is a time-intensive endeavor. Therefore, not all practice multiple choice questions (MCQ) have corresponding explanations to aid learning. The authors examined artificial intelligence's (AI) potential to create high-quality answer rationales for clinical vignette-style MCQs.

Methods: The authors conducted a single-center pre-post intervention survey study in August 2023 assessing 8 pre-clerkship course director (CD) attitudes towards GPT-4 generated answer rationales to clinical vignette style MCQs. Ten MCQs from each course's question bank were selected and input into GPT-4 with instructions to select the best answer and generate rationales for each answer choice. CDs were provided their unmodified GPT-4 interactions to assess the accuracy, clarity, appropriateness, and likelihood of implementation of the rationales. CDs were asked about time spent reviewing and making necessary modifications, satisfaction, and receptiveness in using GPT-4 for this purpose.

Results: GPT-4 correctly answered 75/80 (93.8%) questions on the first attempt. CDs were receptive to using GPT-4 for rationale generation and all were satisfied with the generated rationales. CDs determined that the majority of rationales were very accurate (77.5%), very clear (83.8%) and very appropriate (93.8%). Most rationales could be implemented with little or no modification (88.3%). All CDs would implement AI-generated answer rationales with CD editorial insights. Most CDs (75%) took ≤ 4 min to review a set of generated rationales for a question.

Conclusion: GPT-4 is an acceptable and feasible tool for generating accurate, clear and appropriate answer rationales for MCQs in medical education. Future studies should examine students' feedback to generated rationales and further explore generating rationales for question with media. The authors plan to explore the implementation of this technological application at their medical school, including logistics and training to create a streamlined process that benefits both learners and educators.

Clinical trial: Not applicable; not a clinical trial.

Keywords: Answer rationales; Artificial intelligence; ChatGPT; Clinical vignettes; GPT-4; LLM; Pre-clerkship assessments.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethical approval: Einstein IRB determined this study to be exempt, #2023–15126. Informed consent: Informed consent was obtained from all participants, available upon request. Consent for publication: Not applicable. Disclaimers: None. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Post-intervention survey AI rationale evaluation heat map

References

    1. Clemmons KR, Vuk J, Jarrett DM. Educational videos Versus Question banks: Maximizing Medical Student performance on the United States Medical Licensing Examination Step 1 exam. Cureus. 2023;15(4):e38110. - PMC - PubMed
    1. Levant B, Zückert W, Paolo A. Post-exam feedback with question rationales improves re-test performance of medical students on a multiple-choice exam. Adv Health Sci Educ Theory Pract. 2018;23(5):995–1003. - PubMed
    1. Wojcikowski K, Kirk L. Immediate detailed feedback to test-enhanced learning: an effective online educational tool. Med Teach. 2013;35(11):915–9. - PubMed
    1. Jeyaraju M, Linford H, Bosco Mendes T, Caufield-Noll C, Tackett S. Factors leading to successful performance on U.S. National Licensure exams for Medical students: a scoping review. Acad Med. 2023;98(1):136–48. - PubMed
    1. Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198. - PMC - PubMed

MeSH terms

LinkOut - more resources