Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jul 13:10:80.
doi: 10.1186/1477-7525-10-80.

The importance of rating scales in measuring patient-reported outcomes

Affiliations

The importance of rating scales in measuring patient-reported outcomes

Jyoti Khadka et al. Health Qual Life Outcomes. .

Abstract

Background: A critical component that influences the measurement properties of a patient-reported outcome (PRO) instrument is the rating scale. Yet, there is a lack of general consensus regarding optimal rating scale format, including aspects of question structure, the number and the labels of response categories. This study aims to explore the characteristics of rating scales that function well and those that do not, and thereby develop guidelines for formulating rating scales.

Methods: Seventeen existing PROs designed to measure vision-related quality of life dimensions were mailed for self-administration, in sets of 10, to patients who were on a waiting list for cataract extraction. These PROs included questions with ratings of difficulty, frequency, severity, and global ratings. Using Rasch analysis, performance of rating scales were assessed by examining hierarchical ordering (indicating categories are distinct from each other and follow a logical transition from lower to higher value), evenness (indicating relative utilization of categories), and range (indicating coverage of the attribute by the rating scale).

Results: The rating scales with complicated question format, a large number of response categories, or unlabelled categories, tended to be dysfunctional. Rating scales with five or fewer response categories tended to be functional. Most of the rating scales measuring difficulty performed well. The rating scales measuring frequency and severity demonstrated hierarchical ordering but the categories lacked even utilization.

Conclusion: Developers of PRO instruments should use a simple question format, fewer (four to five) and labelled response categories.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Rasch model category probability curves of a question with four response categories (1, not at all; 2, a little; 3, quite a bit; and 4, a lot). The x-axis represents the attribute in logits. The y-axis represents the probability of a response category being selected. The curves represent the likelihood that a respondent with a particular amount of the latent trait will select a category: illustration of the concepts of scale range (−3 to +3, i.e. 6 logits in this example), 3 thresholds for 4 categories and evenness of categories (category width, 3 logits each; standard deviation of the width, 0).
Figure 2
Figure 2
a–e Rasch model category probability curves showing functional rating scales for items with five response categories that assess ‘difficulty’ in five different questionnaires: (a) Visual Symptoms and Quality of Life Questionnaire, VSQ (Question numbers 1, 6, 8 and 9). Response categories of 1–5 correspond to ‘no difficulty’, ‘yes, a little difficulty’, ‘yes, some difficulty’, ‘yes, a great deal of difficulty’ and ‘I cannot perform the activity because of my eyesight’. (b) Cataract Symptom Scale, CSS (all). Response categories of 0–4 correspond to ‘no’, ‘a little difficulty’, ‘a moderate difficulty’, ‘very difficult’ and ‘unable to do’. (c) Technology of Patient Experiences (Question numbers 2–13). Response categories of 1–5 correspond to ‘not at all’, ‘a little bit’, ‘some’, ‘quite a lot’, and ‘totally disabled’. (d) Visual Function-14, VF-14 (all). Response categories of 0–4 include ‘unable to do the activity’, ‘a great deal, ‘a moderate amount’, ‘a little’ and ‘no’. (e) National Eye Institute –Visual Function Questionnaire NEIVFQ (Question numbers 5–16). Response categories of 1–5 include ‘no difficulty at all’, ‘a little difficulty’, ‘moderate difficulty’, ‘extreme difficulty’ and ‘stopped doing this because of eyesight’. Figure 2 (f) Rasch model category probability curves showing disordered thresholds for five- response category questions that assess ‘difficulty’ in Activities of Daily Living Scale (ADVS). The peak of the two middle categories 2 and 3 are submerged and the thresholds are disordered which represents that the respondents had difficulty discriminating adjacent categories.

Similar articles

Cited by

References

    1. Varma R, Richman EA, Ferris FL, Bressler NM. Use of patient-reported outcomes in medical product development: a report from the 2009 NEI/FDA clinical trial endpoints symposium. Investig Ophthalmol Vis Sci. 2010;51:6095–6103. doi: 10.1167/iovs.10-5627. - DOI - PMC - PubMed
    1. Fayers PM, Sprangers MA. Understanding self-rated health. Lancet. 2002;359:187–188. doi: 10.1016/S0140-6736(02)07466-4. - DOI - PubMed
    1. Revicki DA, Cella DF. Health status assessment for the twenty-first century: item response theory, item banking and computer adaptive testing. Qual Life Res. 1997;6:595–600. doi: 10.1023/A:1018420418455. - DOI - PubMed
    1. Hobart JC, Cano SJ, Zajicek JP, Thompson AJ. Rating scales as outcome measures for clinical trials in neurology: problems, solutions, and recommendations. Lancet Neurol. 2007;6:1094–1105. doi: 10.1016/S1474-4422(07)70290-9. - DOI - PubMed
    1. Revicki DA. FDA draft guidance and health-outcomes research. Lancet. 2007;369:540–542. doi: 10.1016/S0140-6736(07)60250-5. - DOI - PubMed

Publication types

LinkOut - more resources