Pathologists should probably forget about kappa. Percent agreement, diagnostic specificity and related metrics provide more clinically applicable measures of interobserver variability
- PMID: 32623312
- DOI: 10.1016/j.anndiagpath.2020.151561
Pathologists should probably forget about kappa. Percent agreement, diagnostic specificity and related metrics provide more clinically applicable measures of interobserver variability
Abstract
Kappa statistics have been widely used in the pathology literature to compare interobserver diagnostic variability (IOV) among different pathologists but there has been limited discussion about the clinical significance of kappa scores. Five representative and recent pathology papers were queried using clinically relevant specific questions to learn how IOV was evaluated and how the clinical applicability of results was interpreted. The papers supported our anecdotal impression that pathologists usually assess IOV using Cohen's or Fleiss' kappa statistics and interpret the results using some variation of the scale proposed by Landis and Koch. The papers did not cite or propose specific guidelines to comment on the clinical applicability of results. The solutions proposed to decrease IOV included the development of better diagnostic criteria and additional educational efforts, but the possibility that the entities themselves represented a continuum of morphologic findings rather than distinct diagnostic categories was not considered in any of the studies. A dataset from a previous study of IOV reported by Thunnissen et al. was recalculated to estimate percent agreement among 19 international lung pathologists for the diagnosis of 74 challenging lung neuroendocrine neoplasms. Kappa scores and diagnostic sensitivity, specificity, positive and negative predictive values were calculated using the majority consensus diagnosis for each case as the gold reference diagnosis for that case. Diagnostic specificity estimates among multiple pathologists were > 90%, although kappa scores were considerably more variable. We explain why kappa scores are of limited clinical applicability in pathology and propose the use of positive and negative percent agreement and diagnostic specificity against a gold reference diagnosis to evaluate IOV among two and multiple raters, respectively.
Keywords: Diagnostic accuracy; Evidence-based pathology; Interobserver variability; Kappa statistics.
Copyright © 2020 Elsevier Inc. All rights reserved.
Similar articles
-
Interobserver agreement and the impact of mentorship on the diagnosis of inflammatory bowel disease-associated dysplasia among subspecialist gastrointestinal pathologists.Virchows Arch. 2021 Jun;478(6):1061-1069. doi: 10.1007/s00428-020-02998-z. Epub 2021 Jan 3. Virchows Arch. 2021. PMID: 33392796
-
Improvement of diagnostic agreement among pathologists in resolving an "atypical glands suspicious for cancer" diagnosis in prostate biopsies using a novel "Disease-Focused Diagnostic Review" quality improvement process.Hum Pathol. 2016 Oct;56:155-62. doi: 10.1016/j.humpath.2016.06.009. Epub 2016 Jun 23. Hum Pathol. 2016. PMID: 27346573
-
Interobserver variability in the diagnosis of circumscribed sebaceous neoplasms of the skin.Pathology. 2013 Oct;45(6):581-6. doi: 10.1097/PAT.0b013e328365618f. Pathology. 2013. PMID: 24018813
-
Discordance Among Pathologists in the United States and Europe in Diagnosis of Low-Grade Dysplasia for Patients With Barrett's Esophagus.Gastroenterology. 2017 Feb;152(3):564-570.e4. doi: 10.1053/j.gastro.2016.10.041. Epub 2016 Nov 3. Gastroenterology. 2017. PMID: 27818167
-
Interobserver variability studies in diagnostic imaging: a methodological systematic review.Br J Radiol. 2023 Aug;96(1148):20220972. doi: 10.1259/bjr.20220972. Epub 2023 Jun 29. Br J Radiol. 2023. PMID: 37399082 Free PMC article.
Cited by
-
Tissue contamination challenges the credibility of machine learning models in real world digital pathology.medRxiv [Preprint]. 2023 May 2:2023.04.28.23289287. doi: 10.1101/2023.04.28.23289287. medRxiv. 2023. Update in: Mod Pathol. 2024 Mar;37(3):100422. doi: 10.1016/j.modpat.2024.100422. PMID: 37205404 Free PMC article. Updated. Preprint.
-
Seeing the random forest through the decision trees. Supporting learning health systems from histopathology with machine learning models: Challenges and opportunities.J Pathol Inform. 2023 Nov 4;15:100347. doi: 10.1016/j.jpi.2023.100347. eCollection 2024 Dec. J Pathol Inform. 2023. PMID: 38162950 Free PMC article. Review.
-
Accuracy of intraoral digital radiography in assessing maxillary Sinus-Root relationship compared to CBCT.Saudi Dent J. 2022 Jul;34(5):397-403. doi: 10.1016/j.sdentj.2022.04.007. Epub 2022 Apr 28. Saudi Dent J. 2022. PMID: 35814843 Free PMC article.
-
The Potential of Percent Agreement as an Adjunctive Diagnostic Tool for Acute Temporomandibular Disorder.J Clin Med. 2024 Sep 10;13(18):5360. doi: 10.3390/jcm13185360. J Clin Med. 2024. PMID: 39336847 Free PMC article.
-
Team resuscitation for paediatrics (TRAP); application and validation of a paediatric resuscitation quality instrument in non-simulated resuscitations.Resusc Plus. 2024 Dec 12;21:100844. doi: 10.1016/j.resplu.2024.100844. eCollection 2025 Jan. Resusc Plus. 2024. PMID: 39807283 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources