An experimental comparison of multiple-choice and short-answer questions on a high-stakes test for medical students

Janet Mee¹, Ravi Pandian¹, Justin Wolczynski¹, Amy Morales¹, Miguel Paniagua², Polina Harik¹, Peter Baldwin¹, Brian E Clauser³

Affiliations

¹ NBME, Philadelphia, USA.
² American College of Physicians, Philadelphia, USA.
³ NBME, Philadelphia, USA. bclauser@nbme.org.

PMID: 37665413
PMCID: PMC11208249
DOI: 10.1007/s10459-023-10266-3

Comparative Study

An experimental comparison of multiple-choice and short-answer questions on a high-stakes test for medical students

Janet Mee et al. Adv Health Sci Educ Theory Pract. 2024 Jul.

. 2024 Jul;29(3):783-801.

doi: 10.1007/s10459-023-10266-3. Epub 2023 Sep 4.

Authors

Janet Mee¹, Ravi Pandian¹, Justin Wolczynski¹, Amy Morales¹, Miguel Paniagua², Polina Harik¹, Peter Baldwin¹, Brian E Clauser³

Affiliations

¹ NBME, Philadelphia, USA.
² American College of Physicians, Philadelphia, USA.
³ NBME, Philadelphia, USA. bclauser@nbme.org.

PMID: 37665413
PMCID: PMC11208249
DOI: 10.1007/s10459-023-10266-3

Abstract

Recent advances in automated scoring technology have made it practical to replace multiple-choice questions (MCQs) with short-answer questions (SAQs) in large-scale, high-stakes assessments. However, most previous research comparing these formats has used small examinee samples testing under low-stakes conditions. Additionally, previous studies have not reported on the time required to respond to the two item types. This study compares the difficulty, discrimination, and time requirements for the two formats when examinees responded as part of a large-scale, high-stakes assessment. Seventy-one MCQs were converted to SAQs. These matched items were randomly assigned to examinees completing a high-stakes assessment of internal medicine. No examinee saw the same item in both formats. Items administered in the SAQ format were generally more difficult than items in the MCQ format. The discrimination index for SAQs was modestly higher than that for MCQs and response times were substantially higher for SAQs. These results support the interchangeability of MCQs and SAQs. When it is important that the examinee generate the response rather than selecting it, SAQs may be preferred. The results relating to difficulty and discrimination reported in this paper are consistent with those of previous studies. The results on the relative time requirements for the two formats suggest that with a fixed testing time fewer SAQs can be administered, this limitation more than makes up for the higher discrimination that has been reported for SAQs. We additionally examine the extent to which increased difficulty may directly impact the discrimination of SAQs.

Keywords: Constructed response; Item performance; Multiple choice; Short answer.

PubMed Disclaimer

Conflict of interest statement

The authors have no conflicts of interest regarding this research.

Figures

**Fig. 1**
Number of raters agreeing when scoring short-answer question responses

**Fig. 2**
Percent correct (p-values) for 71 items in multiple-choice and short-answer formats

**Fig. 3**
Scatter plot of the proportion of examinees selecting the correct answer and the most frequently selected distractor for items for which the SAQ format was more difficult and items for which the MCQ format was more difficult

**Fig. 4**
Discrimination values for 71 items in multiple-choice and short-answer formats

**Fig. 5**
Average response times for 71 items in multiple-choice and short-answer formats

**Fig. 6**
Expected Reliability and P-Value. Note: Expected reliability (KR 20; scale shown on the left axis) and expected p-value (scale shown on right axis) as a function of the difference between mean item difficulty and mean simulee proficiency. Reliability is greatest when mean item difficulty and mean examinee proficiency are equal (p-value) $\approx 0.50$

See this image and copyright information in PMC

References

1. Baldwin P. A problem with the bookmark procedure’s correction for guessing. Educational Measurement. 2021;40:7–15. doi: 10.1111/emip.12400. - DOI
1. Bridgeman B. A simple answer to a simple question on answer changing. Journal of Educational Measurement. 2012;49:467–468. doi: 10.1111/j.1745-3984.2012.00189.x. - DOI
1. Clauser BE, Margolis MJ, Swanson DB. An examination of the contribution of the computer-based case simulations to the USMLE Step 3 examination. Academic Medicine (RIME Supplement) 2002;77(10):S80–S82. - PubMed
1. Heemskerk L, Norman G, Chou S, Mintz M, Mandin H, McLaughlin K. The effect of question format and task difficulty on reasoning strategies and diagnostic performance in internal medicine residents. Advances in Health Sciences Education. 2008;13(4):453–462. doi: 10.1007/s10459-006-9057-8. - DOI - PubMed
1. Hift RJ. Should essays and other “open-ended”-type questions retain a place in written summative assessment in clinical medicine? BMC Medical Education. 2014;14(1):1–18. doi: 10.1186/s12909-014-0249-2. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An experimental comparison of multiple-choice and short-answer questions on a high-stakes test for medical students

Affiliations

An experimental comparison of multiple-choice and short-answer questions on a high-stakes test for medical students

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources