Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007:16 Suppl 1:5-18.
doi: 10.1007/s11136-007-9198-0. Epub 2007 Mar 21.

Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement

Affiliations

Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement

Maria Orlando Edelen et al. Qual Life Res. 2007.

Abstract

Background: Health outcomes researchers are increasingly applying Item Response Theory (IRT) methods to questionnaire development, evaluation, and refinement efforts.

Objective: To provide a brief overview of IRT, to review some of the critical issues associated with IRT applications, and to demonstrate the basic features of IRT with an example.

Methods: Example data come from 6,504 adolescent respondents in the National Longitudinal Study of Adolescent Health public use data set who completed to the 19-item Feelings Scale for depression. The sample was split into a development and validation sample. Scale items were calibrated in the development sample with the Graded Response Model and the results were used to construct a 10-item short form. The short form was evaluated in the validation sample by examining the correspondence between IRT scores from the short form and the original, and by comparing the proportion of respondents identified as depressed according to the original and short form observed cut scores.

Results: The 19 items varied in their discrimination (slope parameter range: .86-2.66), and item location parameters reflected a considerable range of depression (-.72-3.39). However, the item set is most discriminating at higher levels of depression. In the validation sample IRT scores generated from the short and long forms were correlated at .96 and the average difference in these scores was -.01. In addition, nearly 90% of the sample was classified identically as at risk or not at risk for depression using observed score cut points from the short and long forms.

Conclusions: When used appropriately, IRT can be a powerful tool for questionnaire development, evaluation, and refinement, resulting in precise, valid, and relatively brief instruments that minimize response burden.

PubMed Disclaimer

References

    1. Psychol Assess. 2000 Mar;12(1):102-11 - PubMed
    1. Health Educ Res. 2006 Dec;21 Suppl 1:i19-32 - PubMed
    1. Med Care. 2007 May;45(5 Suppl 1):S22-31 - PubMed
    1. Pediatrics. 2000 Oct;106(4):748-55 - PubMed
    1. Psychol Methods. 2003 Jun;8(2):164-84 - PubMed

Publication types

LinkOut - more resources