Pathologists should probably forget about kappa. Percent agreement, diagnostic specificity and related metrics provide more clinically applicable measures of interobserver variability

Alberto M Marchevsky¹, Ann E Walts², Birgit I Lissenberg-Witte³, Erik Thunnissen⁴

Affiliations

¹ Department of Pathology & Laboratory Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, United States of America. Electronic address: Alberto.Marchevsky@cshs.org.
² Department of Pathology & Laboratory Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, United States of America.
³ Department of Epidemiology and Data Science, UMC, Vrije Universiteit Amsterdam, the Netherlands.
⁴ Department of Pathology, UMC, Vrije Universiteit Amsterdam, the Netherlands.

PMID: 32623312
DOI: 10.1016/j.anndiagpath.2020.151561

Review

Pathologists should probably forget about kappa. Percent agreement, diagnostic specificity and related metrics provide more clinically applicable measures of interobserver variability

Alberto M Marchevsky et al. Ann Diagn Pathol. 2020 Aug.

. 2020 Aug:47:151561.

doi: 10.1016/j.anndiagpath.2020.151561. Epub 2020 Jun 28.

Authors

Alberto M Marchevsky¹, Ann E Walts², Birgit I Lissenberg-Witte³, Erik Thunnissen⁴

Affiliations

¹ Department of Pathology & Laboratory Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, United States of America. Electronic address: Alberto.Marchevsky@cshs.org.
² Department of Pathology & Laboratory Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, United States of America.
³ Department of Epidemiology and Data Science, UMC, Vrije Universiteit Amsterdam, the Netherlands.
⁴ Department of Pathology, UMC, Vrije Universiteit Amsterdam, the Netherlands.

PMID: 32623312
DOI: 10.1016/j.anndiagpath.2020.151561

Abstract

Kappa statistics have been widely used in the pathology literature to compare interobserver diagnostic variability (IOV) among different pathologists but there has been limited discussion about the clinical significance of kappa scores. Five representative and recent pathology papers were queried using clinically relevant specific questions to learn how IOV was evaluated and how the clinical applicability of results was interpreted. The papers supported our anecdotal impression that pathologists usually assess IOV using Cohen's or Fleiss' kappa statistics and interpret the results using some variation of the scale proposed by Landis and Koch. The papers did not cite or propose specific guidelines to comment on the clinical applicability of results. The solutions proposed to decrease IOV included the development of better diagnostic criteria and additional educational efforts, but the possibility that the entities themselves represented a continuum of morphologic findings rather than distinct diagnostic categories was not considered in any of the studies. A dataset from a previous study of IOV reported by Thunnissen et al. was recalculated to estimate percent agreement among 19 international lung pathologists for the diagnosis of 74 challenging lung neuroendocrine neoplasms. Kappa scores and diagnostic sensitivity, specificity, positive and negative predictive values were calculated using the majority consensus diagnosis for each case as the gold reference diagnosis for that case. Diagnostic specificity estimates among multiple pathologists were > 90%, although kappa scores were considerably more variable. We explain why kappa scores are of limited clinical applicability in pathology and propose the use of positive and negative percent agreement and diagnostic specificity against a gold reference diagnosis to evaluate IOV among two and multiple raters, respectively.

Keywords: Diagnostic accuracy; Evidence-based pathology; Interobserver variability; Kappa statistics.

PubMed Disclaimer

Cited by

Tissue contamination challenges the credibility of machine learning models in real world digital pathology.
Irmakci I, Nateghi R, Zhou R, Ross AE, Yang XJ, Cooper LAD, Goldstein JA. Irmakci I, et al. medRxiv [Preprint]. 2023 May 2:2023.04.28.23289287. doi: 10.1101/2023.04.28.23289287. medRxiv. 2023. Update in: Mod Pathol. 2024 Mar;37(3):100422. doi: 10.1016/j.modpat.2024.100422. PMID: 37205404 Free PMC article. Updated. Preprint.
Seeing the random forest through the decision trees. Supporting learning health systems from histopathology with machine learning models: Challenges and opportunities.
Gonzalez R, Saha A, Campbell CJV, Nejat P, Lokker C, Norgan AP. Gonzalez R, et al. J Pathol Inform. 2023 Nov 4;15:100347. doi: 10.1016/j.jpi.2023.100347. eCollection 2024 Dec. J Pathol Inform. 2023. PMID: 38162950 Free PMC article. Review.
Accuracy of intraoral digital radiography in assessing maxillary Sinus-Root relationship compared to CBCT.
Eid EA, El-Badawy FM, Hamed WM. Eid EA, et al. Saudi Dent J. 2022 Jul;34(5):397-403. doi: 10.1016/j.sdentj.2022.04.007. Epub 2022 Apr 28. Saudi Dent J. 2022. PMID: 35814843 Free PMC article.
The Potential of Percent Agreement as an Adjunctive Diagnostic Tool for Acute Temporomandibular Disorder.
Choi SY, Ok SM, Jeong SH, Ahn YW, Jeon HM, Ju HM. Choi SY, et al. J Clin Med. 2024 Sep 10;13(18):5360. doi: 10.3390/jcm13185360. J Clin Med. 2024. PMID: 39336847 Free PMC article.
Team resuscitation for paediatrics (TRAP); application and validation of a paediatric resuscitation quality instrument in non-simulated resuscitations.
Flood S, Alletag M, D'Amico B, Halstead S, Mahar P, Rochford L, Markowitz G, Leonard J, Ambroggio L, Neubrand T. Flood S, et al. Resusc Plus. 2024 Dec 12;21:100844. doi: 10.1016/j.resplu.2024.100844. eCollection 2025 Jan. Resusc Plus. 2024. PMID: 39807283 Free PMC article.

See all "Cited by" articles

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- ClinicalKey
- Elsevier Science

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Pathologists should probably forget about kappa. Percent agreement, diagnostic specificity and related metrics provide more clinically applicable measures of interobserver variability

Affiliations

Pathologists should probably forget about kappa. Percent agreement, diagnostic specificity and related metrics provide more clinically applicable measures of interobserver variability

Authors

Affiliations

Abstract

Similar articles

Cited by

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources