Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Aug;14(8):985-91.
doi: 10.1016/j.acra.2007.04.015.

Reliable evaluation of performance level for computer-aided diagnostic scheme

Affiliations

Reliable evaluation of performance level for computer-aided diagnostic scheme

Qiang Li. Acad Radiol. 2007 Aug.

Abstract

Rationale and objectives: Computer-aided diagnostic (CAD) schemes have been developed for assisting radiologists in the detection of various lesions in medical images. The reliable evaluation of CAD schemes is an important task in the field of CAD research.

Materials and methods: Many evaluation approaches have been proposed for evaluating the performance of various CAD schemes in the past. However, some important issues in the evaluation of CAD schemes have not been systematically analyzed. The first important issue is the analysis and comparison of various evaluation methods in terms of certain characteristics. The second includes the analysis of pitfalls in the incorrect use of various evaluation methods and the effective approaches to the reduction of the bias and variance caused by these pitfalls. We attempt to address the first important issue in details in this article by conducting Monte Carlo simulation experiments, and to discuss the second issue in the Discussion section.

Results: No single evaluation method is universally superior to the others; different situations of CAD applications require different evaluation methods, as recommended in this article. Bias and variance in the estimated performance levels caused by various pitfalls can be reduced considerably by the correct use of good evaluation methods.

Conclusions: This article would be useful to researchers in the field of CAD research for selecting appropriate evaluation methods and for improving the reliability of the estimated performance of their CAD schemes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The generalization performance levels obtained by use of the resubstitution (GP-RS), leave-one-out (GP-LOO), hold-out (GP-HO), and bootstrap methods (GP-BS).
Figure 2(a)
Figure 2(a)
The generalization performance (GP-RS) and the mean estimated performance (EP-RS), with error bars of one standard deviation, for the 100 trials by use of the resubstitution method. There were large biases between the generalization performance levels and the estimated performance levels, particularly when the sample sizes were small.
Figure 2(b)
Figure 2(b)
The generalization performance (GP-LOO) and the mean estimated performance (EP-LOO), with error bars of one standard deviation, for the 100 trials by use of the leave-one-out method. There was nearly no bias between the generalization performance levels and the estimated performance levels.
Figure 2(c)
Figure 2(c)
The generalization performance (GP-HO) and the mean estimated performance (EP-HO), with error bars of one standard deviation, for the 100 trials by use of the hold-out method. There was nearly no bias between the generalization performance levels and the estimated performance levels.
Figure 2(d)
Figure 2(d)
The generalization performance (GP-BS) and the mean estimated performance (EP-BS), with error bars of one standard deviation, for the 100 trials by use of the bootstrap method. There were large biases between the generalization performance levels and the estimated performance levels, particularly when the sample sizes were small. However, the standard deviation of the estimated perfromance levels was comparable to that of the RS method, and was smaller than those of the LOO and HO methods.

Similar articles

Cited by

References

    1. Chan HP, Doi K, Vyborny CJ, Schmidt RA, Metz CE, Lam KL, Ogura T, Wu Y, MacMahon H. Improvement in radiologists' detection of clustered microcalcifications on mammograms: The potential of computer-aided diagnosis. Invest Radiol. 1990;25:1102–1110. - PubMed
    1. Kobayashi T, Xu X, MacMahon H, Metz CE, Doi K. Effect of a computer-aided diagnosis scheme on radiologists' performance in detection of lung nodules on radiographs. Radiology. 1996;199:843–848. - PubMed
    1. Fukunaga K, Hayes RR. Effects of sample size on classifier design. IEEE Trans Pattern Anal Mach Intell. 1989;11:873–885.
    1. Fukunaga K, Hayes RR. Estimation of classifier performance. IEEE Trans Pattern Anal Mach Intell. 1989;11:1087–1101.
    1. Wagner RF, Chan H-P, Mossoba JT, Sahiner B, Petrick N. Finite-sample effects and resampling plans: applications to linear classifiers in computer-aided diagnosis. Proc SPIE Conf Medical Imaging. 1997;3034:467–477.

Publication types

MeSH terms