Testing Differential Item Functioning in Small Samples

William C M Belzak¹

Affiliations

PMID: 31583903
DOI: 10.1080/00273171.2019.1671162

Comparative Study

Testing Differential Item Functioning in Small Samples

William C M Belzak. Multivariate Behav Res. 2020 Sep-Oct.

. 2020 Sep-Oct;55(5):722-747.

doi: 10.1080/00273171.2019.1671162. Epub 2019 Oct 4.

Author

William C M Belzak¹

Affiliation

¹ Department of Psychology and Neuroscience, University of North Carolina at Chapel Hill.

PMID: 31583903
DOI: 10.1080/00273171.2019.1671162

Abstract

Differential item functioning (DIF) is a pernicious statistical issue that can mask true group differences on a target latent construct. A considerable amount of research has focused on evaluating methods for testing DIF, such as using likelihood ratio tests in item response theory (IRT). Most of this research has focused on the asymptotic properties of DIF testing, in part because many latent variable methods require large samples to obtain stable parameter estimates. Much less research has evaluated these methods in small sample sizes despite the fact that many social and behavioral scientists frequently encounter small samples in practice. In this article, we examine the extent to which model complexity-the number of model parameters estimated simultaneously-affects the recovery of DIF in small samples. We compare three models that vary in complexity: logistic regression with sum scores, the 1-parameter logistic IRT model, and the 2-parameter logistic IRT model. We expected that logistic regression with sum scores and the 1-parameter logistic IRT model would more accurately estimate DIF because these models yielded more stable estimates despite being misspecified. Indeed, a simulation study and empirical example of adolescent substance use show that, even when data are generated from / assumed to be a 2-parameter logistic IRT, using parsimonious models in small samples leads to more powerful tests of DIF while adequately controlling for Type I error. We also provide evidence for minimum sample sizes needed to detect DIF, and we evaluate whether applying corrections for multiple testing is advisable. Finally, we provide recommendations for applied researchers who conduct DIF analyses in small samples.

Keywords: differential item functioning; item response theory; logistic regression; measurement invariance; model complexity; small samples.

PubMed Disclaimer

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Atypon
Medical
- MedlinePlus Consumer Health Information
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Testing Differential Item Functioning in Small Samples

Affiliation

Testing Differential Item Functioning in Small Samples

Author

Affiliation

Abstract

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical