. 2007 Aug;14(8):985-91.

doi: 10.1016/j.acra.2007.04.015.

Reliable evaluation of performance level for computer-aided diagnostic scheme

Qiang Li¹

Affiliations

PMID: 17659245
PMCID: PMC2039704
DOI: 10.1016/j.acra.2007.04.015

Reliable evaluation of performance level for computer-aided diagnostic scheme

Qiang Li. Acad Radiol. 2007 Aug.

. 2007 Aug;14(8):985-91.

doi: 10.1016/j.acra.2007.04.015.

Author

Qiang Li¹

Affiliation

¹ Department of Radiology, University of Chicago, Chicago, IL 60637, USA. qiangli@uchicago.edu

PMID: 17659245
PMCID: PMC2039704
DOI: 10.1016/j.acra.2007.04.015

Abstract

Rationale and objectives: Computer-aided diagnostic (CAD) schemes have been developed for assisting radiologists in the detection of various lesions in medical images. The reliable evaluation of CAD schemes is an important task in the field of CAD research.

Materials and methods: Many evaluation approaches have been proposed for evaluating the performance of various CAD schemes in the past. However, some important issues in the evaluation of CAD schemes have not been systematically analyzed. The first important issue is the analysis and comparison of various evaluation methods in terms of certain characteristics. The second includes the analysis of pitfalls in the incorrect use of various evaluation methods and the effective approaches to the reduction of the bias and variance caused by these pitfalls. We attempt to address the first important issue in details in this article by conducting Monte Carlo simulation experiments, and to discuss the second issue in the Discussion section.

Results: No single evaluation method is universally superior to the others; different situations of CAD applications require different evaluation methods, as recommended in this article. Bias and variance in the estimated performance levels caused by various pitfalls can be reduced considerably by the correct use of good evaluation methods.

Conclusions: This article would be useful to researchers in the field of CAD research for selecting appropriate evaluation methods and for improving the reliability of the estimated performance of their CAD schemes.

PubMed Disclaimer

Figures

**Figure 1**
The generalization performance levels obtained by use of the resubstitution (GP-RS), leave-one-out (GP-LOO), hold-out (GP-HO), and bootstrap methods (GP-BS).

**Figure 2(a)**
The generalization performance (GP-RS) and the mean estimated performance (EP-RS), with error bars of one standard deviation, for the 100 trials by use of the resubstitution method. There were large biases between the generalization performance levels and the estimated performance levels, particularly when the sample sizes were small.

**Figure 2(b)**
The generalization performance (GP-LOO) and the mean estimated performance (EP-LOO), with error bars of one standard deviation, for the 100 trials by use of the leave-one-out method. There was nearly no bias between the generalization performance levels and the estimated performance levels.

**Figure 2(c)**
The generalization performance (GP-HO) and the mean estimated performance (EP-HO), with error bars of one standard deviation, for the 100 trials by use of the hold-out method. There was nearly no bias between the generalization performance levels and the estimated performance levels.

**Figure 2(d)**
The generalization performance (GP-BS) and the mean estimated performance (EP-BS), with error bars of one standard deviation, for the 100 trials by use of the bootstrap method. There were large biases between the generalization performance levels and the estimated performance levels, particularly when the sample sizes were small. However, the standard deviation of the estimated perfromance levels was comparable to that of the RS method, and was smaller than those of the LOO and HO methods.

See this image and copyright information in PMC

Cited by

Accuracy of a 2-minute eye-tracking assessment to differentiate young children with and without autism.
Hudry K, Chetcuti L, Tan DW, Clark A, Aulich A, Bent CA, Green CC, Smith J, Fordyce K, Ninomiya M, Saito A, Hakoshima S, Whitehouse AJO. Hudry K, et al. Mol Autism. 2025 Jul 10;16(1):36. doi: 10.1186/s13229-025-00670-4. Mol Autism. 2025. PMID: 40640958 Free PMC article.
Diagnosing Autism Spectrum Disorder Without Expertise: A Pilot Study of 5- to 17-Year-Old Individuals Using Gazefinder.
Tsuchiya KJ, Hakoshima S, Hara T, Ninomiya M, Saito M, Fujioka T, Kosaka H, Hirano Y, Matsuo M, Kikuchi M, Maegaki Y, Harada T, Nishimura T, Katayama T. Tsuchiya KJ, et al. Front Neurol. 2021 Jan 28;11:603085. doi: 10.3389/fneur.2020.603085. eCollection 2020. Front Neurol. 2021. PMID: 33584502 Free PMC article.
Automated classification of metaphase chromosomes: optimization of an adaptive computerized scheme.
Wang X, Zheng B, Li S, Mulvihill JJ, Wood MC, Liu H. Wang X, et al. J Biomed Inform. 2009 Feb;42(1):22-31. doi: 10.1016/j.jbi.2008.05.004. Epub 2008 May 21. J Biomed Inform. 2009. PMID: 18585097 Free PMC article.
Massive-training support vector regression and Gaussian process for false-positive reduction in computer-aided detection of polyps in CT colonography.
Xu JW, Suzuki K. Xu JW, et al. Med Phys. 2011 Apr;38(4):1888-902. doi: 10.1118/1.3562898. Med Phys. 2011. PMID: 21626922 Free PMC article.
Computer-aided detection; the effect of training databases on detection of subtle breast masses.
Zheng B, Wang X, Lederman D, Tan J, Gur D. Zheng B, et al. Acad Radiol. 2010 Nov;17(11):1401-8. doi: 10.1016/j.acra.2010.06.009. Epub 2010 Jul 22. Acad Radiol. 2010. PMID: 20650667 Free PMC article.

References

1. Chan HP, Doi K, Vyborny CJ, Schmidt RA, Metz CE, Lam KL, Ogura T, Wu Y, MacMahon H. Improvement in radiologists' detection of clustered microcalcifications on mammograms: The potential of computer-aided diagnosis. Invest Radiol. 1990;25:1102–1110. - PubMed
1. Kobayashi T, Xu X, MacMahon H, Metz CE, Doi K. Effect of a computer-aided diagnosis scheme on radiologists' performance in detection of lung nodules on radiographs. Radiology. 1996;199:843–848. - PubMed
1. Fukunaga K, Hayes RR. Effects of sample size on classifier design. IEEE Trans Pattern Anal Mach Intell. 1989;11:873–885.
1. Fukunaga K, Hayes RR. Estimation of classifier performance. IEEE Trans Pattern Anal Mach Intell. 1989;11:1087–1101.
1. Wagner RF, Chan H-P, Mossoba JT, Sahiner B, Petrick N. Finite-sample effects and resampling plans: applications to linear classifiers in computer-aided diagnosis. Proc SPIE Conf Medical Imaging. 1997;3034:467–477.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

CA64370/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Reliable evaluation of performance level for computer-aided diagnostic scheme

Affiliation

Reliable evaluation of performance level for computer-aided diagnostic scheme

Author

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous