Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2017 Nov;24(11):1436-1446.
doi: 10.1016/j.acra.2017.05.007. Epub 2017 Jun 27.

The Reproducibility of Changes in Diagnostic Figures of Merit Across Laboratory and Clinical Imaging Reader Studies

Affiliations
Review

The Reproducibility of Changes in Diagnostic Figures of Merit Across Laboratory and Clinical Imaging Reader Studies

Frank W Samuelson et al. Acad Radiol. 2017 Nov.

Abstract

Rationale and objectives: In this paper we examine which comparisons of reading performance between diagnostic imaging systems made in controlled retrospective laboratory studies may be representative of what we observe in later clinical studies. The change in a meaningful diagnostic figure of merit between two diagnostic modalities should be qualitatively or quantitatively comparable across all kinds of studies.

Materials and methods: In this meta-study we examine the reproducibility of relative measures of sensitivity, false positive fraction (FPF), area under the receiver operating characteristic (ROC) curve, and expected utility across laboratory and observational clinical studies for several different breast imaging modalities, including screen film mammography, digital mammography, breast tomosynthesis, and ultrasound.

Results: Across studies of all types, the changes in the FPFs yielded very small probabilities of having a common mean value. The probabilities of relative sensitivity being the same across ultrasound and tomosynthesis studies were low. No evidence was found for different mean values of relative area under the ROC curve or relative expected utility within any of the study sets.

Conclusion: The comparison demonstrates that the ratios of areas under the ROC curve and expected utilities are reproducible across laboratory and clinical studies, whereas sensitivity and FPF are not.

Keywords: AUC; Sensitivity; reproducibility; specificity.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The solid line is an ROC curve, giving the trade-off between the true and false positive fractions as a decision threshold is varied on the output of a diagnostic. The shaded area below the curve is the AUC. The length of dotted vertical line is the expected utility (EU). It is the intercept of the dashed line of constant utility that is tangent to the ROC curve at the optimal operating point (circle) [26].
Figure 2
Figure 2
The relative true positive fraction, rTPF, estimated from eight studies comparing full-field digital mammography and screen-film mammography, four studies comparing X-ray mammography with and without ultrasound, and eight studies comparing digital mammography with and without Hologic digital breast tomosynthesis imaging. These studies are listed in Tables 1, 2, and 3. Open circles or squares indicate observational clinical studies. Squares are European studies. Solid circles indicate controlled laboratory reader studies. Equal performance of the two modalities in each study is indicated by the vertical line at 1.0. Horizontal bars are approximate 95% confidence intervals on the mean value.
Figure 3
Figure 3
The relative false positive fraction, rFPF, estimated from the 20 studies listed in Tables 1, 2, and 3. See the caption of Figure 2 for other details. Error bars on some of the clinical studies are probably underestimated.
Figure 4
Figure 4
The relative area under the ROC curve, rAUC, estimated from the 20 studies listed in Tables 1, 2, and 3. The Berg et al. [65] study calculated AUC in two ways, and both are shown. See the caption of Figure 2 for other details.
Figure 5
Figure 5
The relative expected utility, rEU, estimated from the 20 studies listed in Tables 1, 2, and 3. See the caption of Figure 2 for other details.

References

    1. Swets JA, Pickett RM. Evaluation of diagnostic systems: methods from signal detection theory. Academic Press; New York: 1982.
    1. Metz CE, Wagner RF, Doi K, Brown DG, Nishikawa RM, Myers KJ. Toward consensus on quantitative assessment of medical imaging systems. Medical Physics. 1995;22:1057–61. - PubMed
    1. Gur D, Rockette HE, Warfel T, Lacomis JM, Fuhrman CR. From the laboratory to the clinic: The prevalence effect. Academic Radiology. 2003;10:1324–1326. - PubMed
    1. Metz CE. ROC analysis in medical imaging: a tutorial review of the literature. Radiol Phys Technol. 2008;1(1):2–12. - PubMed
    1. Gallas BD, Chan HP, DOrsi CJ, Dodd LE, Giger ML, Gur D, Krupinski EA, Metz CE, Myers KJ, Obuchowski NA, Sahiner B, Toledano AY, Zuley ML. Evaluating imaging and computer-aided detection and diagnosis devices at the FDA. Academic Radiology. 2012;19:463–477. - PMC - PubMed