Is an ROC-type response truly always better than a binary response in observer performance studies?

David Gur¹, Andriy I Bandos, Howard E Rockette, Margarita L Zuley, Christiane M Hakim, Denise M Chough, Marie A Ganott, Jules H Sumkin

Affiliations

PMID: 20236840
PMCID: PMC2856622
DOI: 10.1016/j.acra.2009.12.012

Is an ROC-type response truly always better than a binary response in observer performance studies?

David Gur et al. Acad Radiol. 2010 May.

. 2010 May;17(5):639-45.

doi: 10.1016/j.acra.2009.12.012. Epub 2010 Mar 16.

Authors

David Gur¹, Andriy I Bandos, Howard E Rockette, Margarita L Zuley, Christiane M Hakim, Denise M Chough, Marie A Ganott, Jules H Sumkin

Affiliation

¹ University of Pittsburgh, Department of Radiology, Radiology Imaging Research, Pittsburgh, PA 15213, USA. gurd@upmc.edu

PMID: 20236840
PMCID: PMC2856622
DOI: 10.1016/j.acra.2009.12.012

Abstract

Rationale and objectives: The aim of this study was to assess similarities and differences between methods of performance comparisons under binary (yes or no) and receiver-operating characteristic (ROC)-type pseudocontinuous (0-100) rating data ascertained during an observer performance study of interpretation of full-field digital mammography (FFDM) versus FFDM plus digital breast tomosynthesis.

Materials and methods: Rating data consisted of ROC-type pseudocontinuous and binary ratings generated by eight radiologists evaluating 77 digital mammographic examinations. Overall performance levels were summarized with a conventionally used probability of correct discrimination or, equivalently, the area under the ROC curve (AUC), which under a binary scale is related to Youden's index. Magnitudes of differences in the reader-averaged empirical AUCs between FFDM alone and FFDM plus digital breast tomosynthesis were compared in the context of fixed-reader and random-reader variability of the estimates.

Results: The absolute differences between modes using the empirical AUCs were larger on average for the binary scale (0.12 vs 0.07) and for the majority of individual readers (six of eight). Standardized differences were consistent with this finding (2.32 vs 1.63 on average). Reader-averaged differences in AUCs standardized by fixed-reader and random-reader variances were also smaller under the binary rating paradigm. The discrepancy between AUC differences depended on the location of the reader-specific binary operating points.

Conclusions: The human observer's operating point should be a primary consideration in designing an observer performance study. Although in general, the ROC-type rating paradigm provides more detailed information on the characteristics of different modes, it does not reflect the actual operating point adopted by human observers. There are application-driven scenarios in which analysis based on binary responses may provide statistical advantages.

PubMed Disclaimer

Figures

**Figure 1**
Two hypothetical ROC representing performance curves of two systems with three hypothetical pairs of corresponding operating points along the same curves simulating binary response type results.

See this image and copyright information in PMC

Comment in

The importance of ROC data.
Samuelson F, Gallas BD, Myers KJ, Petrick N, Pinsky P, Sahiner B, Campbell G, Pennello GA. Samuelson F, et al. Acad Radiol. 2011 Feb;18(2):257-8; author reply 259-61. doi: 10.1016/j.acra.2010.10.016. Acad Radiol. 2011. PMID: 21232688 No abstract available.

References

1. Fenton JJ, Taplin SH, Carney PA, et al. Influence of computer-aided detection on performance of screening mammography. N Engl J Med. 2007 Apr 5;356(14):1399–409. - PMC - PubMed
1. Awai K, Murao K, Ozawa A, et al. Pulmonary nodules: estimation of malignancy at thin-section helical CT--effect of computer-aided diagnosis on performance of radiologists. Radiology. 2006 Apr;239(1):276–84. - PubMed
1. Gur D, Sumkin JH, Rockette HE, et al. Changes in breast cancer detection and mammography recall rates after the introduction of a computer-aided detection system. J Natl Cancer Inst. 2004 Feb 4;96(3):185–90. - PubMed
1. Andriole GL, Crawford ED, Grubb RL, 3rd, et al. PLCO Project Team Mortality results from a randomized prostate-cancer screening trial. N Engl J Med. 2009 Mar 26;360(13):1310–9. Epub 2009 Mar 18. Erratum in. - PMC - PubMed
2. N Engl J Med. 2009 Apr 23;360(17):1797.
1. Pisano ED, Gatsonis C, Hendrick E, et al. Digital Mammographic Imaging Screening Trial (DMIST) Investigators Group Diagnostic performance of digital versus film mammography for breast-cancer screening. N Engl J Med. 2005 Oct 27;353(17):1773–83. Epub 2005 Sep 16. Erratum in. - PubMed
2. N Engl J Med. 2006 Oct 26;355(17):1840.

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Is an ROC-type response truly always better than a binary response in observer performance studies?

Affiliation

Is an ROC-type response truly always better than a binary response in observer performance studies?

Authors

Affiliation

Abstract

Figures

Comment in

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical