Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan;40(1):37-55.
doi: 10.1177/0146621615592718. Epub 2015 Jul 22.

Double Entropy Inter-Rater Agreement Indices

Affiliations

Double Entropy Inter-Rater Agreement Indices

Andriy Olenko et al. Appl Psychol Meas. 2016 Jan.

Abstract

The proper application of the most frequently used inter-rater agreement indices can be problematic for the case of a single target, for example, a psychotherapy patient, a student's thesis, a grant proposal, and the lifestyle in a country. The majority of indices that can handle this case assess either the deviation of ranks from some central/average value or the pattern of ranks' distribution. Contrary to other approaches, this article defines disagreement rating results using the unpredictability/complexity of scores. The article discusses alternative entropy methods for measuring inter-rater agreement or consensus in survey responses for the case of a single target. A new inter-rater agreement index is proposed. Comparisons between this index and the known inter-rater agreement measures show some limitations of the most frequently used indices. Various important methodological issues such as disagreement assumptions, average sensitivity, adjustments to deal with outliers, and missing or incorrectly recorded data are discussed. Examples of applications to actual data are presented.

Keywords: agreement index; average sensitivity; psychological statistics; similarity measure.

PubMed Disclaimer

Conflict of interest statement

Declaration of Conflicting Interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1.
Figure 1.
Case of n=10 and R=(4,5,1,1,4,0,1,2).
Figure 2.
Figure 2.
Examples of most disagreement rating results.
Figure 3.
Figure 3.
Average sensitivity for 5-, 10-, and 15-level response scales.
Figure 4.
Figure 4.
Example patterns.
Figure 5.
Figure 5.
Case of outliers.
Figure 6.
Figure 6.
Case of incorrect records.
Figure 7.
Figure 7.
κ* versus H*(P) and H*(Q).
Figure 8.
Figure 8.
Distributions of life satisfaction scores.
Figure 9.
Figure 9.
Distributions of scores for two prompts.

References

    1. Ambrey C. L., Fleming C. M. (2014). Life satisfaction in Australia: Evidence from ten years of the HILDA Survey. Social Indicators Research, 115, 691-714.
    1. Baca-García E., Blanco C., Sáiz-Ruiz J., Rico F., Diaz-Sastre C., Cicchetti D. V. (2001). Assessment of reliability in the clinical evaluation of depressive symptoms among multiple investigators in a multicenter clinical trial. Psychiatry Research, 102, 163-173. - PubMed
    1. Bonikowska A., Helliwell J. F., Hou F., Schellenberg G. (2014). An assessment of life satisfaction responses on recent statistics Canada surveys. Social Indicators Research, 118, 617-643.
    1. Brown R. D., Hauenstein N. M. A. (2005). Interrater agreement reconsidered: An alternative to the rwg indices. Organizational Research Methods, 8, 165-184.
    1. Burke M. J., Dunlap W. (2002). Estimating interrater agreement with the average deviation index: A user’s guide. Organizational Research Methods, 5, 159-172.

LinkOut - more resources