Large-Sample Variance of Fleiss Generalized Kappa

Kilem L Gwet¹

Affiliations

PMID: 34267400
PMCID: PMC8243202
DOI: 10.1177/0013164420973080

Large-Sample Variance of Fleiss Generalized Kappa

Kilem L Gwet. Educ Psychol Meas. 2021 Aug.

. 2021 Aug;81(4):781-790.

doi: 10.1177/0013164420973080. Epub 2021 Feb 15.

Author

Kilem L Gwet¹

Affiliation

¹ AgreeStat Analytics, Gaithersburg, MD, USA.

PMID: 34267400
PMCID: PMC8243202
DOI: 10.1177/0013164420973080

Abstract

Cohen's kappa coefficient was originally proposed for two raters only, and it later extended to an arbitrarily large number of raters to become what is known as Fleiss' generalized kappa. Fleiss' generalized kappa and its large-sample variance are still widely used by researchers and were implemented in several software packages, including, among others, SPSS and the R package "rel." The purpose of this article is to show that the large-sample variance of Fleiss' generalized kappa is systematically being misused, is invalid as a precision measure for kappa, and cannot be used for constructing confidence intervals. A general-purpose variance expression is proposed, which can be used in any statistical inference procedure. A Monte-Carlo experiment is presented, showing the validity of the new variance estimation procedure.

Keywords: Cohen kappa; Fleiss kappa; Gwet AC1; interrater reliability.

PubMed Disclaimer

Conflict of interest statement

Declaration of Conflicting Interests: The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

**Figure 1.**
Monte Carlo simulation coverage rates.

See this image and copyright information in PMC

References

1. Cicchetti D. V., Feinstein A. R. (1990). High agreement but low kappa: II. Resolving the paradoxes. Journal of Clinical Epidemiology, 43(6), 551-558. 10.1016/0895-4356(90)90159-M - DOI - PubMed
1. Cohen J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37-46. 10.1177/001316446002000104 - DOI
1. Conger A. J. (1980). Integration and generalization of kappas for multiple raters. Psychological Bulletin, 88(2), 322-328. 10.1037/0033-2909.88.2.322 - DOI
1. Feinstein A. R., Cicchetti D. V. (1990). High agreement but low kappa: I. The problems of two paradoxes. Journal of Clinical Epidemiology, 43(6), 543-549. 10.1016/0895-4356(90)90158-L - DOI - PubMed
1. Fleiss J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378-382. 10.1037/h0031619 - DOI

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Large-Sample Variance of Fleiss Generalized Kappa

Affiliation

Large-Sample Variance of Fleiss Generalized Kappa

Author

Affiliation

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources