Interobserver agreement: Cohen's kappa coefficient does not necessarily reflect the percentage of patients with congruent classifications
- PMID: 9088995
Interobserver agreement: Cohen's kappa coefficient does not necessarily reflect the percentage of patients with congruent classifications
Abstract
A widely accepted approach to evaluate interrater reliability for categorical responses involves the rating of n subjects by at least 2 raters. Frequently, there are only 2 response categories, such as positive or negative diagnosis. The same approach is commonly used to assess the concordant classification by 2 diagnostic methods. Depending on whether one uses the percent agreement as such or corrected for that expected by chance, i.e. Cohen's kappa coefficient, one can get quite different values. This short communication demonstrates that Cohen's kappa coefficient of agreement between 2 raters or 2 diagnostic methods based on binary (yes/no) responses does not parallel the percentage of patients with congruent classifications. Therefore, it may be of limited value in the assessment of increases in the interrater reliability due to an improved diagnostic method. The percentage of patients with congruent classifications is of easier clinical interpretation, however, does not account for the percent of agreement expected by chance. We, therefore, recommend to present both, the percentage of patients with congruent classifications, and Cohen's kappa coefficient with 95% confidence limits.
Similar articles
-
[Analyzing interrater agreement for categorical data using Cohen's kappa and alternative coefficients].Rehabilitation (Stuttg). 2007 Dec;46(6):370-7. doi: 10.1055/s-2007-976535. Rehabilitation (Stuttg). 2007. PMID: 18188809 German.
-
Clinicians are right not to like Cohen's κ.BMJ. 2013 Apr 12;346:f2125. doi: 10.1136/bmj.f2125. BMJ. 2013. PMID: 23585065
-
[VII: Diagnostic trials: Simple measures of validity and reliability].Klin Monbl Augenheilkd. 2003 Apr;220(4):281-3. doi: 10.1055/s-2003-38633. Klin Monbl Augenheilkd. 2003. PMID: 12695973 German.
-
Observer variation in the diagnosis of thyroid disorders. Criteria for and impact on diagnostic decision-making.Dan Med Bull. 2000 Nov;47(5):328-39. Dan Med Bull. 2000. PMID: 11155660 Review.
-
An interrater reliability study of a new 'zonal' classification for reporting the location of retinal haemorrhages in childhood for clinical, legal and research purposes.Br J Ophthalmol. 2010 Jul;94(7):886-90. doi: 10.1136/bjo.2009.162271. Epub 2009 Oct 21. Br J Ophthalmol. 2010. PMID: 19846410 Review.
Cited by
-
Interrater reliability of chinese medicine diagnosis in people with prediabetes.Evid Based Complement Alternat Med. 2013;2013:710892. doi: 10.1155/2013/710892. Epub 2013 May 9. Evid Based Complement Alternat Med. 2013. PMID: 23762155 Free PMC article.
-
Experimental Studies of Inter-Rater Agreement in Traditional Chinese Medicine: A Systematic Review.J Altern Complement Med. 2019 Nov;25(11):1085-1096. doi: 10.1089/acm.2019.0197. J Altern Complement Med. 2019. PMID: 31730402 Free PMC article.
-
The role of telepathology in diagnosis of pre-malignant and malignant cervical lesions: Implementation at a tertiary hospital in Northern Tanzania.PLoS One. 2022 Apr 14;17(4):e0266649. doi: 10.1371/journal.pone.0266649. eCollection 2022. PLoS One. 2022. PMID: 35421156 Free PMC article.
-
How do statistical properties influence findings of tracking (maintenance) in epidemiologic studies? An example of research in tracking of obesity.Eur J Epidemiol. 2003;18(11):1037-45. doi: 10.1023/a:1026196310041. Eur J Epidemiol. 2003. PMID: 14620937
-
Assessing fidelity of delivery of smoking cessation behavioural support in practice.Implement Sci. 2013 Apr 4;8:40. doi: 10.1186/1748-5908-8-40. Implement Sci. 2013. PMID: 23557119 Free PMC article.