Inter-rater and test-retest reliability of quality assessments by novice student raters using the Jadad and Newcastle-Ottawa Scales
- PMID: 22855629
- PMCID: PMC4400798
- DOI: 10.1136/bmjopen-2012-001368
Inter-rater and test-retest reliability of quality assessments by novice student raters using the Jadad and Newcastle-Ottawa Scales
Abstract
Introduction: Quality assessment of included studies is an important component of systematic reviews.
Objective: The authors investigated inter-rater and test-retest reliability for quality assessments conducted by inexperienced student raters.
Design: Student raters received a training session on quality assessment using the Jadad Scale for randomised controlled trials and the Newcastle-Ottawa Scale (NOS) for observational studies. Raters were randomly assigned into five pairs and they each independently rated the quality of 13-20 articles. These articles were drawn from a pool of 78 papers examining cognitive impairment following electroconvulsive therapy to treat major depressive disorder. The articles were randomly distributed to the raters. Two months later, each rater re-assessed the quality of half of their assigned articles.
Setting: McMaster Integrative Neuroscience Discovery and Study Program.
Participants: 10 students taking McMaster Integrative Neuroscience Discovery and Study Program courses.
Main outcome measures: The authors measured inter-rater reliability using κ and the intraclass correlation coefficient type 2,1 or ICC(2,1). The authors measured test-retest reliability using ICC(2,1).
Results: Inter-rater reliability varied by scale question. For the six-item Jadad Scale, question-specific κs ranged from 0.13 (95% CI -0.11 to 0.37) to 0.56 (95% CI 0.29 to 0.83). The ranges were -0.14 (95% CI -0.28 to 0.00) to 0.39 (95% CI -0.02 to 0.81) for the NOS cohort and -0.20 (95% CI -0.49 to 0.09) to 1.00 (95% CI 1.00 to 1.00) for the NOS case-control. For overall scores on the six-item Jadad Scale, ICC(2,1)s for inter-rater and test-retest reliability (accounting for systematic differences between raters) were 0.32 (95% CI 0.08 to 0.52) and 0.55 (95% CI 0.41 to 0.67), respectively. Corresponding ICC(2,1)s for the NOS cohort were -0.19 (95% CI -0.67 to 0.35) and 0.62 (95% CI 0.25 to 0.83), and for the NOS case-control, the ICC(2,1)s were 0.46 (95% CI -0.13 to 0.92) and 0.83 (95% CI 0.48 to 0.95).
Conclusions: Inter-rater reliability was generally poor to fair and test-retest reliability was fair to excellent. A pilot rating phase following rater training may be one way to improve agreement.
Conflict of interest statement
References
-
- Agency for Healthcare Research and Quality (AHRQ). Systems to Rate the Strength of Scientific Evidence. Evidence Report/Technology Assessment No. 47. Rockville, MD: Agency for Healthcare Research and Quality, 2002.
-
- Verhagen AP, de Vet HC, de Bie RA, et al. The art of quality assessment of RCTs included in systematic reviews. J Clin Epidemiol 2001;54:651–4. - PubMed
-
- Wells GA, Shea B, O'Connell D, et al. The Newcastle-Ottawa Scale (NOS) for Assessing the Quality of Nonrandomised Studies in Meta-analyses. Ottawa: Ottawa Hospital Research Institute. http://www.ohri.ca/programs/clinical_epidemiology/oxford.asp (accessed 2 Apr 2012).
LinkOut - more resources
Full Text Sources