. 2012;22(3):276-82.

Interrater reliability: the kappa statistic

Mary L McHugh¹

Affiliations

PMID: 23092060
PMCID: PMC3900052

Interrater reliability: the kappa statistic

Mary L McHugh. Biochem Med (Zagreb). 2012.

. 2012;22(3):276-82.

Author

Mary L McHugh¹

Affiliation

¹ Department of Nursing, National University, Aero Court, San Diego, California, USA. mchugh8688@gmail.com

PMID: 23092060
PMCID: PMC3900052

Abstract

The kappa statistic is frequently used to test interrater reliability. The importance of rater reliability lies in the fact that it represents the extent to which the data collected in the study are correct representations of the variables measured. Measurement of the extent to which data collectors (raters) assign the same score to the same variable is called interrater reliability. While there have been a variety of methods to measure interrater reliability, traditionally it was measured as percent agreement, calculated as the number of agreement scores divided by the total number of scores. In 1960, Jacob Cohen critiqued use of percent agreement due to its inability to account for chance agreement. He introduced the Cohen's kappa, developed to account for the possibility that raters actually guess on at least some variables due to uncertainty. Like most correlation statistics, the kappa can range from -1 to +1. While the kappa is one of the most commonly used statistics to test interrater reliability, it has limitations. Judgments about what level of kappa should be acceptable for health research are questioned. Cohen's suggested interpretation may be too lenient for health related studies because it implies that a score as low as 0.41 might be acceptable. Kappa and percent agreement are compared, and levels for both kappa and percent agreement that should be demanded in healthcare studies are suggested.

PubMed Disclaimer

Figures

**Figure 1.**
Components of data in a research data set.

**Figure 2.**
Graphical representation of amount of correct data by % agreement or squared kappa value.

**Figure 3.**
Data for kappa calculation example.

**Figure 4.**
Calculation of the kappa statistic.

See this image and copyright information in PMC

References

1. Bluestein D, Javaheri A. Pressure Ulcers: Prevention, Evaluation, and Management. Am Fam Physician. 2008;78:1186–94. - PubMed
1. Kottner J, Halfens R, Dassen T. An interrater reliability study of the assessment of pressure ulcer risk using the Braden scale and the classification of pressure ulcers in a home care setting. Int J Nurs Stud. 2009;46:1307–12. - PubMed
1. Fahey MT, Irwig L, Macaskill P. Meta-analysis of Pap Test Accuracy. Am J Epidemiol. 1995;141:680–9. - PubMed
1. Bonnyman A, Webber C, Stratford P, MacIntire N. Intrarater reliability of dual-energy X-Ray absorptiometry–based measures of vertebral height in postmenopausal women. J Clin Densitom. 2012 doi: 10.1016/j.jocd.2012.03.005. - DOI - PubMed
1. Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20:37–46.

MeSH terms

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- ClinicalTrials.gov

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Interrater reliability: the kappa statistic

Affiliation

Interrater reliability: the kappa statistic

Author

Affiliation

Abstract

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical