An Experimental Design to Investigate Item Parameter Drift

Peter Baldwin¹, Irina Grabovsky¹, Kimberly A Swygert¹, Thomas Fogle¹, Pilar Reid¹, Brian E Clauser¹

Affiliations

PMID: 39867873
PMCID: PMC11760077
DOI: 10.1177/01466216251316282

An Experimental Design to Investigate Item Parameter Drift

Peter Baldwin et al. Appl Psychol Meas. 2025.

. 2025 Jan 24:01466216251316282.

doi: 10.1177/01466216251316282. Online ahead of print.

Authors

Peter Baldwin¹, Irina Grabovsky¹, Kimberly A Swygert¹, Thomas Fogle¹, Pilar Reid¹, Brian E Clauser¹

Affiliation

¹ NBME, Philadelphia, PA, USA.

PMID: 39867873
PMCID: PMC11760077
DOI: 10.1177/01466216251316282

Abstract

Methods for detecting item parameter drift may be inadequate when every exposed item is at risk for drift. To address this scenario, a strategy for detecting item parameter drift is proposed that uses only unexposed items deployed in a stratified random method within an experimental design. The proposed method is illustrated by investigating unexpected score increases on a high-stakes licensure exam. Results for this example were suggestive of item parameter drift but not significant at the .05 level.

Keywords: differential item functioning; invariance; item parameter drift; test security.

PubMed Disclaimer

Conflict of interest statement

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

**Figure 1.**
Form assembly design. Anchor sets are common across years and the study sets are unique but randomly equivalent sets of new (previously unseen) items. To ensure that the study sets are randomly equivalent, *both* sets must be assembled prior to Year 1.

**Figure 2.**
Frequency histogram of item difficulty estimates for Year 1 and Year 2 study items. In the figure legend, the difficulty median and standard deviation for each year are also reported along with the difference in medians.

**Figure 3.**
Frequency histogram of differences in median difficulties across years if the null hypothesis of no IPD is true. This is a randomization distribution based on 1,000,000 permutations of item assignment to year. It can be seen that a value equal or more extreme to the observed value $d *$ is expected to arise due to sampling error 9.8% of the time under the null hypothesis.

See this image and copyright information in PMC

References

1. Baldwin P., Clauser B. E. (2022). Historical perspectives on score comparability issues raised by innovations in testing. Journal of Educational Measurement, 59(2), 140–160. 10.1111/jedm.12318 - DOI
1. Bechger T. M., Maris G. (2015). A statistical test for differential item pair functioning. Psychometrika, 80(2), 317–340. 10.1007/s11336-014-9408-y - DOI - PubMed
1. Donoghue J. R., Isham S. P. (1998). A comparison of procedures to detect item parameter drift. Applied Psychological Measurement, 22(1), 33–51. 10.1177/01466216980221002 - DOI
1. Ernst M. D. (2004). Permutation methods: A basis for exact inference. Statistical Science, 19, 676–685.
1. Goldstein H. (1983). Measuring changes in educational attainment over time: Problems and possibilities. Journal of Educational Measurement, 20(4), 369–377. 10.1111/j.1745-3984.1983.tb00214.x - DOI

LinkOut - more resources

Full Text Sources
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An Experimental Design to Investigate Item Parameter Drift

Affiliation

An Experimental Design to Investigate Item Parameter Drift

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources