Determination of the number of observers needed to evaluate a subjective test and its application in two PD-L1 studies
- PMID: 34897773
- PMCID: PMC10243718
- DOI: 10.1002/sim.9282
Determination of the number of observers needed to evaluate a subjective test and its application in two PD-L1 studies
Abstract
In pathological studies, subjective assays, especially companion diagnostic tests, can dramatically affect treatment of cancer. Binary diagnostic test results (ie, positive vs negative) may vary between pathologists or observers who read the tumor slides. Some tests have clearly defined criteria resulting in highly concordant outcomes, even with minimal training. Other tests are more challenging. Observers may achieve poor concordance even with training. While there are many statistically rigorous methods for measuring concordance between observers, we are unaware of a method that can identify how many observers are needed to determine whether a test can reach an acceptable concordance, if at all. Here we introduce a statistical approach to the assessment of test performance when the test is read by multiple observers, as would occur in the real world. By plotting the number of observers against the estimated overall agreement proportion, we can obtain a curve that plateaus to the average observer concordance. Diagnostic tests that are well-defined and easily judged show high concordance and plateau with few interobserver comparisons. More challenging tests do not plateau until many interobserver comparisons are made, and typically reach a lower plateau or even 0. We further propose a statistical test of whether the overall agreement proportion will drop to 0 with a large number of pathologists. The proposed analytical framework can be used to evaluate the difficulty in the interpretation of pathological test criteria and platforms, and to determine how pathology-based subjective tests will perform in the real world. The method could also be used outside of pathology, where concordance of a diagnosis or decision point relies on the subjective application of multiple criteria. We apply this method in two recent PD-L1 studies to test whether the curve of overall agreement proportion will converge to 0 and determine the minimal sufficient number of observers required to estimate the concordance plateau of their reads.
Keywords: Binomial distribution; concordance; inflated binomial distribution; overall agreement proportion; pathological tests.
© 2021 John Wiley & Sons Ltd.
Figures




Similar articles
-
Prospective multi-institutional evaluation of pathologist assessment of PD-L1 assays for patient selection in triple negative breast cancer.Mod Pathol. 2020 Sep;33(9):1746-1752. doi: 10.1038/s41379-020-0544-x. Epub 2020 Apr 16. Mod Pathol. 2020. PMID: 32300181 Free PMC article.
-
High interobserver and intraobserver reproducibility among pathologists assessing PD-L1 CPS across multiple indications.Histopathology. 2022 Dec;81(6):732-741. doi: 10.1111/his.14775. Epub 2022 Sep 23. Histopathology. 2022. PMID: 35993150
-
A Prospective, Multi-institutional, Pathologist-Based Assessment of 4 Immunohistochemistry Assays for PD-L1 Expression in Non-Small Cell Lung Cancer.JAMA Oncol. 2017 Aug 1;3(8):1051-1058. doi: 10.1001/jamaoncol.2017.0013. JAMA Oncol. 2017. PMID: 28278348 Free PMC article.
-
Comparability of PD-L1 immunohistochemistry assays for non-small-cell lung cancer: a systematic review.Histopathology. 2020 May;76(6):793-802. doi: 10.1111/his.14040. Epub 2020 Mar 24. Histopathology. 2020. PMID: 31793055 Free PMC article.
-
Analytical Concordance of PD-L1 Assays Utilizing Antibodies From FDA-Approved Diagnostics in Advanced Cancers: A Systematic Literature Review.JCO Precis Oncol. 2021 Jun 8;5:953-973. doi: 10.1200/PO.20.00412. eCollection 2021 Jun. JCO Precis Oncol. 2021. PMID: 34136742 Free PMC article.
Cited by
-
Multi-institutional Assessment of Pathologist Scoring HER2 Immunohistochemistry.Mod Pathol. 2023 Jan;36(1):100032. doi: 10.1016/j.modpat.2022.100032. Mod Pathol. 2023. PMID: 36788069 Free PMC article.
-
Measuring disability among U.S. adolescents and young adults: A survey measurement experiment.Prev Med Rep. 2024 May 23;43:102770. doi: 10.1016/j.pmedr.2024.102770. eCollection 2024 Jul. Prev Med Rep. 2024. PMID: 38846156 Free PMC article.
-
Development of an immunohistochemical assay for Siglec-15.Lab Invest. 2022 Jul;102(7):771-778. doi: 10.1038/s41374-022-00785-9. Epub 2022 Apr 22. Lab Invest. 2022. PMID: 35459795 Free PMC article.
-
Weakly-supervised deep learning models enable HER2-low prediction from H &E stained slides.Breast Cancer Res. 2024 Aug 19;26(1):124. doi: 10.1186/s13058-024-01863-0. Breast Cancer Res. 2024. PMID: 39160593 Free PMC article.
-
The Evolution of Ki-67 and Breast Carcinoma: Past Observations, Present Directions, and Future Considerations.Cancers (Basel). 2023 Jan 28;15(3):808. doi: 10.3390/cancers15030808. Cancers (Basel). 2023. PMID: 36765765 Free PMC article. Review.
References
-
- Diaz LK, Sahin A, Sneige N. Interobserver agreement for estrogen receptor immunohistochemical analysis in breast cancer: a comparison of manual and computer-assisted scoring methods. Ann Diagn Pathol. 2004;8(1):23–27. - PubMed
-
- Leung SC, Nielsen TO, Zabaglo LA, et al. Analytical validation of a standardised scoring protocol for Ki67 immunohistochemistry on breast cancer excision whole sections: an international multicentre collaboration. Histopathology. 2019;75(2):225–235. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials