Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2024 Jul;29(3):783-801.
doi: 10.1007/s10459-023-10266-3. Epub 2023 Sep 4.

An experimental comparison of multiple-choice and short-answer questions on a high-stakes test for medical students

Affiliations
Comparative Study

An experimental comparison of multiple-choice and short-answer questions on a high-stakes test for medical students

Janet Mee et al. Adv Health Sci Educ Theory Pract. 2024 Jul.

Abstract

Recent advances in automated scoring technology have made it practical to replace multiple-choice questions (MCQs) with short-answer questions (SAQs) in large-scale, high-stakes assessments. However, most previous research comparing these formats has used small examinee samples testing under low-stakes conditions. Additionally, previous studies have not reported on the time required to respond to the two item types. This study compares the difficulty, discrimination, and time requirements for the two formats when examinees responded as part of a large-scale, high-stakes assessment. Seventy-one MCQs were converted to SAQs. These matched items were randomly assigned to examinees completing a high-stakes assessment of internal medicine. No examinee saw the same item in both formats. Items administered in the SAQ format were generally more difficult than items in the MCQ format. The discrimination index for SAQs was modestly higher than that for MCQs and response times were substantially higher for SAQs. These results support the interchangeability of MCQs and SAQs. When it is important that the examinee generate the response rather than selecting it, SAQs may be preferred. The results relating to difficulty and discrimination reported in this paper are consistent with those of previous studies. The results on the relative time requirements for the two formats suggest that with a fixed testing time fewer SAQs can be administered, this limitation more than makes up for the higher discrimination that has been reported for SAQs. We additionally examine the extent to which increased difficulty may directly impact the discrimination of SAQs.

Keywords: Constructed response; Item performance; Multiple choice; Short answer.

PubMed Disclaimer

Conflict of interest statement

The authors have no conflicts of interest regarding this research.

Figures

Fig. 1
Fig. 1
Number of raters agreeing when scoring short-answer question responses
Fig. 2
Fig. 2
Percent correct (p-values) for 71 items in multiple-choice and short-answer formats
Fig. 3
Fig. 3
Scatter plot of the proportion of examinees selecting the correct answer and the most frequently selected distractor for items for which the SAQ format was more difficult and items for which the MCQ format was more difficult
Fig. 4
Fig. 4
Discrimination values for 71 items in multiple-choice and short-answer formats
Fig. 5
Fig. 5
Average response times for 71 items in multiple-choice and short-answer formats
Fig. 6
Fig. 6
Expected Reliability and P-Value. Note: Expected reliability (KR 20; scale shown on the left axis) and expected p-value (scale shown on right axis) as a function of the difference between mean item difficulty and mean simulee proficiency. Reliability is greatest when mean item difficulty and mean examinee proficiency are equal (p-value) 0.50

References

    1. Baldwin P. A problem with the bookmark procedure’s correction for guessing. Educational Measurement. 2021;40:7–15. doi: 10.1111/emip.12400. - DOI
    1. Bridgeman B. A simple answer to a simple question on answer changing. Journal of Educational Measurement. 2012;49:467–468. doi: 10.1111/j.1745-3984.2012.00189.x. - DOI
    1. Clauser BE, Margolis MJ, Swanson DB. An examination of the contribution of the computer-based case simulations to the USMLE Step 3 examination. Academic Medicine (RIME Supplement) 2002;77(10):S80–S82. - PubMed
    1. Heemskerk L, Norman G, Chou S, Mintz M, Mandin H, McLaughlin K. The effect of question format and task difficulty on reasoning strategies and diagnostic performance in internal medicine residents. Advances in Health Sciences Education. 2008;13(4):453–462. doi: 10.1007/s10459-006-9057-8. - DOI - PubMed
    1. Hift RJ. Should essays and other “open-ended”-type questions retain a place in written summative assessment in clinical medicine? BMC Medical Education. 2014;14(1):1–18. doi: 10.1186/s12909-014-0249-2. - DOI - PMC - PubMed

Publication types

LinkOut - more resources