Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 24:01466216251316282.
doi: 10.1177/01466216251316282. Online ahead of print.

An Experimental Design to Investigate Item Parameter Drift

Affiliations

An Experimental Design to Investigate Item Parameter Drift

Peter Baldwin et al. Appl Psychol Meas. .

Abstract

Methods for detecting item parameter drift may be inadequate when every exposed item is at risk for drift. To address this scenario, a strategy for detecting item parameter drift is proposed that uses only unexposed items deployed in a stratified random method within an experimental design. The proposed method is illustrated by investigating unexpected score increases on a high-stakes licensure exam. Results for this example were suggestive of item parameter drift but not significant at the .05 level.

Keywords: differential item functioning; invariance; item parameter drift; test security.

PubMed Disclaimer

Conflict of interest statement

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1.
Figure 1.
Form assembly design. Anchor sets are common across years and the study sets are unique but randomly equivalent sets of new (previously unseen) items. To ensure that the study sets are randomly equivalent, both sets must be assembled prior to Year 1.
Figure 2.
Figure 2.
Frequency histogram of item difficulty estimates for Year 1 and Year 2 study items. In the figure legend, the difficulty median and standard deviation for each year are also reported along with the difference in medians.
Figure 3.
Figure 3.
Frequency histogram of differences in median difficulties across years if the null hypothesis of no IPD is true. This is a randomization distribution based on 1,000,000 permutations of item assignment to year. It can be seen that a value equal or more extreme to the observed value d* is expected to arise due to sampling error 9.8% of the time under the null hypothesis.

References

    1. Baldwin P., Clauser B. E. (2022). Historical perspectives on score comparability issues raised by innovations in testing. Journal of Educational Measurement, 59(2), 140–160. 10.1111/jedm.12318 - DOI
    1. Bechger T. M., Maris G. (2015). A statistical test for differential item pair functioning. Psychometrika, 80(2), 317–340. 10.1007/s11336-014-9408-y - DOI - PubMed
    1. Donoghue J. R., Isham S. P. (1998). A comparison of procedures to detect item parameter drift. Applied Psychological Measurement, 22(1), 33–51. 10.1177/01466216980221002 - DOI
    1. Ernst M. D. (2004). Permutation methods: A basis for exact inference. Statistical Science, 19, 676–685.
    1. Goldstein H. (1983). Measuring changes in educational attainment over time: Problems and possibilities. Journal of Educational Measurement, 20(4), 369–377. 10.1111/j.1745-3984.1983.tb00214.x - DOI

LinkOut - more resources