Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 May;17(5):639-45.
doi: 10.1016/j.acra.2009.12.012. Epub 2010 Mar 16.

Is an ROC-type response truly always better than a binary response in observer performance studies?

Affiliations

Is an ROC-type response truly always better than a binary response in observer performance studies?

David Gur et al. Acad Radiol. 2010 May.

Abstract

Rationale and objectives: The aim of this study was to assess similarities and differences between methods of performance comparisons under binary (yes or no) and receiver-operating characteristic (ROC)-type pseudocontinuous (0-100) rating data ascertained during an observer performance study of interpretation of full-field digital mammography (FFDM) versus FFDM plus digital breast tomosynthesis.

Materials and methods: Rating data consisted of ROC-type pseudocontinuous and binary ratings generated by eight radiologists evaluating 77 digital mammographic examinations. Overall performance levels were summarized with a conventionally used probability of correct discrimination or, equivalently, the area under the ROC curve (AUC), which under a binary scale is related to Youden's index. Magnitudes of differences in the reader-averaged empirical AUCs between FFDM alone and FFDM plus digital breast tomosynthesis were compared in the context of fixed-reader and random-reader variability of the estimates.

Results: The absolute differences between modes using the empirical AUCs were larger on average for the binary scale (0.12 vs 0.07) and for the majority of individual readers (six of eight). Standardized differences were consistent with this finding (2.32 vs 1.63 on average). Reader-averaged differences in AUCs standardized by fixed-reader and random-reader variances were also smaller under the binary rating paradigm. The discrepancy between AUC differences depended on the location of the reader-specific binary operating points.

Conclusions: The human observer's operating point should be a primary consideration in designing an observer performance study. Although in general, the ROC-type rating paradigm provides more detailed information on the characteristics of different modes, it does not reflect the actual operating point adopted by human observers. There are application-driven scenarios in which analysis based on binary responses may provide statistical advantages.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Two hypothetical ROC representing performance curves of two systems with three hypothetical pairs of corresponding operating points along the same curves simulating binary response type results.

Comment in

  • The importance of ROC data.
    Samuelson F, Gallas BD, Myers KJ, Petrick N, Pinsky P, Sahiner B, Campbell G, Pennello GA. Samuelson F, et al. Acad Radiol. 2011 Feb;18(2):257-8; author reply 259-61. doi: 10.1016/j.acra.2010.10.016. Acad Radiol. 2011. PMID: 21232688 No abstract available.

References

    1. Fenton JJ, Taplin SH, Carney PA, et al. Influence of computer-aided detection on performance of screening mammography. N Engl J Med. 2007 Apr 5;356(14):1399–409. - PMC - PubMed
    1. Awai K, Murao K, Ozawa A, et al. Pulmonary nodules: estimation of malignancy at thin-section helical CT--effect of computer-aided diagnosis on performance of radiologists. Radiology. 2006 Apr;239(1):276–84. - PubMed
    1. Gur D, Sumkin JH, Rockette HE, et al. Changes in breast cancer detection and mammography recall rates after the introduction of a computer-aided detection system. J Natl Cancer Inst. 2004 Feb 4;96(3):185–90. - PubMed
    1. Andriole GL, Crawford ED, Grubb RL, 3rd, et al. PLCO Project Team Mortality results from a randomized prostate-cancer screening trial. N Engl J Med. 2009 Mar 26;360(13):1310–9. Epub 2009 Mar 18. Erratum in. - PMC - PubMed
    2. N Engl J Med. 2009 Apr 23;360(17):1797.
    1. Pisano ED, Gatsonis C, Hendrick E, et al. Digital Mammographic Imaging Screening Trial (DMIST) Investigators Group Diagnostic performance of digital versus film mammography for breast-cancer screening. N Engl J Med. 2005 Oct 27;353(17):1773–83. Epub 2005 Sep 16. Erratum in. - PubMed
    2. N Engl J Med. 2006 Oct 26;355(17):1840.

Publication types