Implementing a Standardized Effect Size in the POLYSIBTEST Procedure

James D Weese¹, Ronna C Turner¹, Xinya Liang¹, Allison Ames¹, Brandon Crawford²

Affiliations

PMID: 36866067
PMCID: PMC9972129
DOI: 10.1177/00131644221081011

Implementing a Standardized Effect Size in the POLYSIBTEST Procedure

James D Weese et al. Educ Psychol Meas. 2023 Apr.

. 2023 Apr;83(2):401-427.

doi: 10.1177/00131644221081011. Epub 2022 Feb 28.

Authors

James D Weese¹, Ronna C Turner¹, Xinya Liang¹, Allison Ames¹, Brandon Crawford²

Affiliations

¹ University of Arkansas Fayetteville, USA.
² Indiana University Bloomington, USA.

PMID: 36866067
PMCID: PMC9972129
DOI: 10.1177/00131644221081011

Abstract

A study was conducted to implement the use of a standardized effect size and corresponding classification guidelines for polytomous data with the POLYSIBTEST procedure and compare those guidelines with prior recommendations. Two simulation studies were included. The first identifies new unstandardized test heuristics for classifying moderate and large differential item functioning (DIF) for polytomous response data with three to seven response options. These are provided for researchers studying polytomous data using POLYSIBTEST software that has been published previously. The second simulation study provides one pair of standardized effect size heuristics that can be employed with items having any number of response options and compares true-positive and false-positive rates for the standardized effect size proposed by Weese with one proposed by Zwick et al. and two unstandardized classification procedures (Gierl; Golia). All four procedures retained false-positive rates generally below the level of significance at both moderate and large DIF levels. However, Weese's standardized effect size was not affected by sample size and provided slightly higher true-positive rates than the Zwick et al. and Golia's recommendations, while flagging substantially fewer items that might be characterized as having negligible DIF when compared with Gierl's suggested criterion. The proposed effect size allows for easier use and interpretation by practitioners as it can be applied to items with any number of response options and is interpreted as a difference in standard deviation units.

Keywords: DIF; POLYSIBTEST; differential item functioning; polytomous data; standardized effect size.

PubMed Disclaimer

Conflict of interest statement

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

**Figure 1.**
Relationship Between ${\hat{β}}_{uni}$ and $δ_{beta}$ for Items With Varying Response Categories. ${\hat{β}}_{uni} = 0.100,$ is the value Gierl (2005) recommended to classify a polytomous item with four response categories as having large DIF; $δ_{beta}$ values of .164 and .241 are the standardized effect size heuristics that Weese (2020) recommended for classifying moderate and large DIF.

**Figure 2.**
Overall True-Positive Rates by Number of Response Categories.

**Figure 3.**
True-Positive Rate by Between-Group Differences in Threshold Parameters for Three to Five Categories.

**Figure 4.**
True-Positive Rate by Between-Group Differences in Threshold Parameters for Six and Seven Categories.

See this image and copyright information in PMC

References

1. ACT. (2020). ACT Aspire summative technical manual. https://success.act.org/s/article/ACT-Aspire-Summative-Technical-Manual
1. American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association.
1. Andresen E., Malmgren J., Carter W., Patrick D. (1994). Screening for depression in well older adults—Evaluation of a short-form of the CES-D. American Journal of Preventive Medicine, 10(2), 77–84. - PubMed
1. Beck A. T., Ward C. H., Mendelson M., Mock J., Earbaugh J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 561–571. - PubMed
1. Bock R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29–51.

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Implementing a Standardized Effect Size in the POLYSIBTEST Procedure

Affiliations

Implementing a Standardized Effect Size in the POLYSIBTEST Procedure

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources