Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Mar 1;39(8):1-30.
doi: 10.18637/jss.v039.i08.

lordif: An R Package for Detecting Differential Item Functioning Using Iterative Hybrid Ordinal Logistic Regression/Item Response Theory and Monte Carlo Simulations

Affiliations

lordif: An R Package for Detecting Differential Item Functioning Using Iterative Hybrid Ordinal Logistic Regression/Item Response Theory and Monte Carlo Simulations

Seung W Choi et al. J Stat Softw. .

Abstract

Logistic regression provides a flexible framework for detecting various types of differential item functioning (DIF). Previous efforts extended the framework by using item response theory (IRT) based trait scores, and by employing an iterative process using group-specific item parameters to account for DIF in the trait scores, analogous to purification approaches used in other DIF detection frameworks. The current investigation advances the technique by developing a computational platform integrating both statistical and IRT procedures into a single program. Furthermore, a Monte Carlo simulation approach was incorporated to derive empirical criteria for various DIF statistics and effect size measures. For purposes of illustration, the procedure was applied to data from a questionnaire of anxiety symptoms for detecting DIF associated with age from the Patient-Reported Outcomes Measurement Information System.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Trait distributions – younger (< 65) vs. older (65 and up). Note: This graph shows smoothed histograms of the anxiety levels of older (dashed line) and younger (solid line) study participants as measured by the PROMIS Anxiety scale (theta). There is broad overlap in the distributions, though older individuals in general demonstrated lower levels of anxiety than younger individuals.
Figure 2
Figure 2
Graphical display of the item “I felt fearful” which shows non–uniform DIF with respect to age. Note: This item retained three response categories (0, 1, and 2) from the original five–point rating scale after collapsing the top three response categories due to sparseness. The program by default uses a minimum of five cases per cell (the user can specify a different minimum) in order to retain each response category. The upper–left graph shows the item characteristic curves (ICCs) for the item for older (dashed curve) vs. younger (solid curve). The upper–right graph shows the absolute difference between the ICCs for the two groups, indicating that the difference is mainly at high levels of anxiety (theta). The lower–left graph shows the item response functions for the two groups based on the demographic–specific item parameter estimates (slope and category threshold values by group), which are also printed on the graph. The lower–right graph shows the absolute difference between the ICCs (the upper–right graph) weighted by the score distribution for the focal group, i.e., older individuals (dashed curve in Figure 1), indicating minimal impact. See text for more details.
Figure 3
Figure 3
Graphical display of the item “I was anxious if my normal routine was disturbed” which shows uniform DIF with respect to age. Note: See detailed comments accompanying Figure 2. Here the differences between younger and older individuals appear to be at lower anxiety levels.
Figure 4
Figure 4
Graphical display of the item “I was easily startled” which shows uniform DIF with respect to age. Note: See detailed comments accompanying Figure 2. Here the differences between younger and older individuals are across almost the entire spectrum of anxiety measured by the test.
Figure 5
Figure 5
Graphical display of the item “I worried about other people’s reactions to me”which shows uniform DIF with respect to age. Note: See detailed comments accompanying Figure 2.
Figure 6
Figure 6
Graphical display of the item “Many situations made me worry” displaying uniform DIF with respect to age. Note: See detailed comments accompanying Figure 2.
Figure 7
Figure 7
Impact of DIF items on test characteristic curves. Note: These graphs show test characteristic curves (TCCs) for younger and older individuals using demographic–specific item parameter estimates. TCCs show the expected total scores for groups of items at each anxiety level (theta). The graph on the left shows these curves for all of the items (both items with and without DIF), while the graph on the right shows these curves for the subset of these items found to have DIF. These curves suggest that at the overall test level there is minimal difference in the total expected score at any anxiety level for older or younger individuals.
Figure 8
Figure 8
Individual–level DIF impact. Note: These graphs show the difference in score between using scores that ignore DIF and those that account for DIF. The graph on the left shows a box plot of these differences. The interquartile range, representing the middle 50% of the differences (bound between the bottom and top of the shaded box), range roughly from +0.03 to +0.12 with a median of approximately +0.10. In the graph on the right the same difference scores are plotted against the initial scores ignoring DIF (“initial theta”), separately for younger and older individuals. Guidelines are placed at 0.0 (solid line), i.e., no difference, and the mean of the differences (dotted line). The positive values to the left of this graph indicate that in almost all cases, accounting for DIF led to slightly lower scores (i.e., naive score ignoring DIF minus score accounting for DIF > 0, so accounting for DIF score is less than the naive score) for those with lower levels of anxiety, but this appears to be consistent across older and younger individuals. The negative values to the right of this graph indicate that for those with higher levels of anxiety, accounting for DIF led to slightly higher scores, but this again was consistent across older and younger individuals.
Figure 9
Figure 9
Monte Carlo thresholds for χ2 probabilities (1,000 replications). Note: The graphs show the probability values for each of the items (shown along the x–axis) associated with the 99th quantile (cutting the largest 1% over 1,000 iterations) of the χ2 statistics generated from Monte Carlo simulations under the no DIF condition (data shown in Table 1). The lines connecting the data points are placed to show the uctuation across items and not to imply a series. The horizontal reference line is placed at the nominal alpha level (0.01).
Figure 10
Figure 10
Monte Carlo thresholds for pseudo R2 (1,000 replications). Note: The graphs show the pseudo R2 measures for each of the items (shown along the x–axis) corresponding to the 99th quantile (cutting the largest 1% over 1,000 iterations) generated from Monte Carlo simulations under the no DIF condition. The lines connecting the data points are placed to show the uctuation across items and not to imply a series.
Figure 11
Figure 11
Monte Carlo thresholds for proportional beta change (1,000 replications). Note: The graphs show the proportionate β1 change measures for each of the items (shown along the x–axis) corresponding to the 99th quantile (cutting the largest 1% over 1,000 iterations) generated from Monte Carlo simulations under the no DIF condition. The lines connecting the data points are placed to show the uctuation across items and not to imply a series.

References

    1. Agresti A. Categorical Data Analysis. John Wiley & Sons; New York: 1990.
    1. Andersen RB. On the Comparability of Meaningful Stimuli in Cross-Cultural Research. Sociometry. 1967;30:124–136. - PubMed
    1. Bjorner JB, Smith KJ, Orlando M, Stone C, Thissen D, Sun X. IRTFIT: A Macro for Item Fit and Local Dependence Tests under IRT Models. Quality Metric Inc; Lincoln, RI: 2006.
    1. Camilli G, Shepard LA. Methods for Identifying Biased Test Items. Sage; Thousand Oaks: 1994.
    1. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2. Lawrence Earlbaum Associates; Hillsdale, NJ: 1988.