Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Feb;37(2):907-20.
doi: 10.1118/1.3284974.

Effect of finite sample size on feature selection and classification: a simulation study

Affiliations

Effect of finite sample size on feature selection and classification: a simulation study

Ted W Way et al. Med Phys. 2010 Feb.

Abstract

Purpose: The small number of samples available for training and testing is often the limiting factor in finding the most effective features and designing an optimal computer-aided diagnosis (CAD) system. Training on a limited set of samples introduces bias and variance in the performance of a CAD system relative to that trained with an infinite sample size. In this work, the authors conducted a simulation study to evaluate the performances of various combinations of classifiers and feature selection techniques and their dependence on the class distribution, dimensionality, and the training sample size. The understanding of these relationships will facilitate development of effective CAD systems under the constraint of limited available samples.

Methods: Three feature selection techniques, the stepwise feature selection (SFS), sequential floating forward search (SFFS), and principal component analysis (PCA), and two commonly used classifiers, Fisher's linear discriminant analysis (LDA) and support vector machine (SVM), were investigated. Samples were drawn from multidimensional feature spaces of multivariate Gaussian distributions with equal or unequal covariance matrices and unequal means, and with equal covariance matrices and unequal means estimated from a clinical data set. Classifier performance was quantified by the area under the receiver operating characteristic curve Az. The mean Az values obtained by resubstitution and hold-out methods were evaluated for training sample sizes ranging from 15 to 100 per class. The number of simulated features available for selection was chosen to be 50, 100, and 200.

Results: It was found that the relative performance of the different combinations of classifier and feature selection method depends on the feature space distributions, the dimensionality, and the available training sample sizes. The LDA and SVM with radial kernel performed similarly for most of the conditions evaluated in this study, although the SVM classifier showed a slightly higher hold-out performance than LDA for some conditions and vice versa for other conditions. PCA was comparable to or better than SFS and SFFS for LDA at small samples sizes, but inferior for SVM with polynomial kernel. For the class distributions simulated from clinical data, PCA did not show advantages over the other two feature selection methods. Under this condition, the SVM with radial kernel performed better than the LDA when few training samples were available, while LDA performed better when a large number of training samples were available.

Conclusions: None of the investigated feature selection-classifier combinations provided consistently superior performance under the studied conditions for different sample sizes and feature space distributions. In general, the SFFS method was comparable to the SFS method while PCA may have an advantage for Gaussian feature spaces with unequal covariance matrices. The performance of the SVM with radial kernel was better than, or comparable to, that of the SVM with polynomial kernel under most conditions studied.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Dependence of the LDA classifier performance Az on training sample size. The two class distributions were multivariate normal with equal covariance matrices and unequal means. The effect of increasing dimensionality of the feature space available for selection (M) is shown in each column. The comparison of the SFS, SFFS, and PCA methods for feature selection is shown in each row.
Figure 2
Figure 2
Dependence of the performance Az of the SVM classifier with radial kernel on training sample size. The two class distributions were multivariate normal with equal covariance matrices and unequal means. The effect of increasing dimensionality of the feature space available for selection (M) is shown in each column. The comparison of the SFS, SFFS, and PCA methods for feature selection is shown in each row.
Figure 3
Figure 3
Dependence of the performance Az of the SVM classifier with polynomial kernel on training sample size. The two class distributions were multivariate normal with equal covariance matrices and unequal means. The effect of increasing dimensionality of the feature space available for selection (M) is shown in each column. The comparison of the SFS, SFFS, and PCA methods for feature selection is shown in each row.
Figure 4
Figure 4
Standard deviation of the hold-out performance as a function of 1∕Ntrain for the SFS, SFFS, and PCA feature selection methods and the LDA classifier. The number of features available for selection was M=100 for the equal covariance matrices (first row) and unequal covariance matrices (second row) conditions, and M=61 for the condition with simulated equal covariance matrices estimated from a clinical data set.
Figure 5
Figure 5
Dependence of the performance Az of the SVM classifier with radial kernel on training sample size. The two class distributions were multivariate normal with unequal covariance matrices and unequal means. The effect of increasing dimensionality of the feature space available for selection (M) is shown in each column. The comparison of the SFS, SFFS, and PCA methods for feature selection is shown in each row.
Figure 6
Figure 6
Comparison of the LDA, SVM(rad), and SVM(poly) classifiers with the same input features obtained from SFS. The two class distributions were multivariate normal with unequal covariance matrices and unequal means.
Figure 7
Figure 7
Performance of the SFS, SFFS, and PCA feature selection methods and the LDA, SVM(rad), and SVM(poly) classifiers for simulated multivariate normal class distributions with equal covariance matrices estimated from a clinical data set (M=61).

Similar articles

Cited by

References

    1. Chan H. P., Sahiner B., Wagner R. F., and Petrick N., “Classifier design for computer-aided diagnosis: Effects of finite sample size on the mean performance of classical and neural network classifiers,” Med. Phys. MPHYA6 26, 2654–2668 (1999).10.1118/1.598805 - DOI - PubMed
    1. Sahiner B., Chan H. P., Petrick N., Wagner R. F., and Hadjiiski L. M., “Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size,” Med. Phys. MPHYA6 27, 1509–1522 (2000).10.1118/1.599017 - DOI - PMC - PubMed
    1. Sahiner B., Chan H. P., and Hadjiiski L., “Classifier performance prediction for computer-aided diagnosis using a limited data set,” Med. Phys. MPHYA6 35, 1559–1570 (2008).10.1118/1.2868757 - DOI - PMC - PubMed
    1. Sahiner B., Chan H. P., and Hadjiiski L. M., “Classifier performance estimation under the constraint of a finite sample size: Resampling schemes applied to neural network classifiers,” Neural Networks NNETEB 21, 476–483 (2008).10.1016/j.neunet.2007.12.012 - DOI - PMC - PubMed
    1. Li Q. and Doi K., “Analysis and minimization of overtraining effect in rule-based classifiers for computer-aided diagnosis,” Med. Phys. MPHYA6 33, 320–328 (2006).10.1118/1.1999126 - DOI - PubMed

Publication types

MeSH terms