Is an ROC-type response truly always better than a binary response in observer performance studies?
- PMID: 20236840
- PMCID: PMC2856622
- DOI: 10.1016/j.acra.2009.12.012
Is an ROC-type response truly always better than a binary response in observer performance studies?
Abstract
Rationale and objectives: The aim of this study was to assess similarities and differences between methods of performance comparisons under binary (yes or no) and receiver-operating characteristic (ROC)-type pseudocontinuous (0-100) rating data ascertained during an observer performance study of interpretation of full-field digital mammography (FFDM) versus FFDM plus digital breast tomosynthesis.
Materials and methods: Rating data consisted of ROC-type pseudocontinuous and binary ratings generated by eight radiologists evaluating 77 digital mammographic examinations. Overall performance levels were summarized with a conventionally used probability of correct discrimination or, equivalently, the area under the ROC curve (AUC), which under a binary scale is related to Youden's index. Magnitudes of differences in the reader-averaged empirical AUCs between FFDM alone and FFDM plus digital breast tomosynthesis were compared in the context of fixed-reader and random-reader variability of the estimates.
Results: The absolute differences between modes using the empirical AUCs were larger on average for the binary scale (0.12 vs 0.07) and for the majority of individual readers (six of eight). Standardized differences were consistent with this finding (2.32 vs 1.63 on average). Reader-averaged differences in AUCs standardized by fixed-reader and random-reader variances were also smaller under the binary rating paradigm. The discrepancy between AUC differences depended on the location of the reader-specific binary operating points.
Conclusions: The human observer's operating point should be a primary consideration in designing an observer performance study. Although in general, the ROC-type rating paradigm provides more detailed information on the characteristics of different modes, it does not reflect the actual operating point adopted by human observers. There are application-driven scenarios in which analysis based on binary responses may provide statistical advantages.
Copyright 2010 AUR. Published by Elsevier Inc. All rights reserved.
Figures
Comment in
-
The importance of ROC data.Acad Radiol. 2011 Feb;18(2):257-8; author reply 259-61. doi: 10.1016/j.acra.2010.10.016. Acad Radiol. 2011. PMID: 21232688 No abstract available.
References
-
- Awai K, Murao K, Ozawa A, et al. Pulmonary nodules: estimation of malignancy at thin-section helical CT--effect of computer-aided diagnosis on performance of radiologists. Radiology. 2006 Apr;239(1):276–84. - PubMed
-
- Gur D, Sumkin JH, Rockette HE, et al. Changes in breast cancer detection and mammography recall rates after the introduction of a computer-aided detection system. J Natl Cancer Inst. 2004 Feb 4;96(3):185–90. - PubMed
-
- Pisano ED, Gatsonis C, Hendrick E, et al. Digital Mammographic Imaging Screening Trial (DMIST) Investigators Group Diagnostic performance of digital versus film mammography for breast-cancer screening. N Engl J Med. 2005 Oct 27;353(17):1773–83. Epub 2005 Sep 16. Erratum in. - PubMed
- N Engl J Med. 2006 Oct 26;355(17):1840.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical