Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jan;12(1):263-76.
doi: 10.1074/mcp.M112.022566. Epub 2012 Oct 31.

A critical assessment of feature selection methods for biomarker discovery in clinical proteomics

Affiliations

A critical assessment of feature selection methods for biomarker discovery in clinical proteomics

Christin Christin et al. Mol Cell Proteomics. 2013 Jan.

Abstract

In this paper, we compare the performance of six different feature selection methods for LC-MS-based proteomics and metabolomics biomarker discovery-t test, the Mann-Whitney-Wilcoxon test (mww test), nearest shrunken centroid (NSC), linear support vector machine-recursive features elimination (SVM-RFE), principal component discriminant analysis (PCDA), and partial least squares discriminant analysis (PLSDA)-using human urine and porcine cerebrospinal fluid samples that were spiked with a range of peptides at different concentration levels. The ideal feature selection method should select the complete list of discriminating features that are related to the spiked peptides without selecting unrelated features. Whereas many studies have to rely on classification error to judge the reliability of the selected biomarker candidates, we assessed the accuracy of selection directly from the list of spiked peptides. The feature selection methods were applied to data sets with different sample sizes and extents of sample class separation determined by the concentration level of spiked compounds. For each feature selection method and data set, the performance for selecting a set of features related to spiked compounds was assessed using the harmonic mean of the recall and the precision (f-score) and the geometric mean of the recall and the true negative rate (g-score). We conclude that the univariate t test and the mww test with multiple testing corrections are not applicable to data sets with small sample sizes (n = 6), but their performance improves markedly with increasing sample size up to a point (n > 12) at which they outperform the other methods. PCDA and PLSDA select small feature sets with high precision but miss many true positive features related to the spiked peptides. NSC strikes a reasonable compromise between recall and precision for all data sets independent of spiking level and number of samples. Linear SVM-RFE performs poorly for selecting features related to the spiked compounds, even though the classification error is relatively low.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Double cross validation scheme for the nearest shrunken centroid algorithm. In the inner loops of the double cross validation scheme, the sum of the true class probability score at the respective shrinkage is calculated (maximum provides the optimal shrinkage). The final optimal feature set is selected using the shrinkage at the maximum of the median of the sum of the true class probability and the shrinkage plot after the double cross validation procedure. Performance of the classification is measured using optimal parameters in the outer loop by calculating the classification error rate on the outer loop training data set (double cross validation error).
Fig. 2.
Fig. 2.
Bar charts of the median (±IQR) of the number of selected features (top) and the number of true positives (bottom) for each combination of feature selection methods and data sets (see Table I for details concerning data sets). Results for the univariate tests (t test and mww test) on data sets 1a and 2a are denoted by **, and results of the univariate t test on data sets 0a and 0b are denoted by *, because these methods selected no features at a sample size of 6 or 5 respectively. Univariate t tests on data sets 0a and 0b (5 samples/class) and univariate tests (t test and mww test) on data sets 1c and 2c (6 samples/class) were performed once including all available samples per class without repetition.
Fig. 3.
Fig. 3.
Bar charts (left) and scatter plot (right) of the median (±IQR) of recall (top) and precision (bottom) for each combination of feature selection methods and data sets. Recall and precision were not available for the t test and the mww test on data sets 1a and 2a, containing 6 samples (denoted by **), or for the mww test on data sets 0a and 0b, containing 5 samples (denoted by *). Univariate t tests on data sets 0a and 0b (5 samples/class) and univariate tests (t test and mww test) on data sets 1c and 2c (6 samples/class) were performed once including all available samples per class without repetition. Gray error bars in the scatter plot show the IQRs of recall and precision.
Fig. 4.
Fig. 4.
Bar charts (left) and scatter plot (right) of the median (±IQR) f-score (top) and g-score (bottom) for each combination of feature selection methods and data sets. The f-score and g-score were not available for the t test and the mww test on data sets containing 6 samples (denoted by **) or for the mww test on data sets 0a and 0b containing 5 samples (denoted by *). Univariate t tests on data sets 0a and 0b (5 samples/class) and univariate tests (t test and mww test) on data sets 1c and 2c (6 samples/class) were performed once including all available samples per class without repetition. Gray error bars in the scatter plot show the IQRs of recall and precision.
Fig. 5.
Fig. 5.
Overview of the two best performing feature selection statistical methods for data sets of different sample sizes and between- and within-class variability of spiked peptides based on the f-score. NSC shows the best performance for data sets with 6 samples independent of between- and within-class variability of spiked peptides, whereas univariate tests rank on top when the sample size increases to 15 samples per class or for low-sample-size data sets (0a and 0b) with low within-class variability of spiked peptides.

References

    1. Mischak H., Allmaier G., Apweiler R., Attwood T., Baumann M., Benigni A., Bennett S. E., Bischoff R., Bongcam-Rudloff E., Capasso G., Coon J. J., D'Haese P., Dominiczak A. F., Dakna M., Dihazi H., Ehrich J. H., Fernandez-Llama P., Fliser D., Frokiaer J., Garin J., Girolami M., Hancock W. S., Haubitz M., Hochstrasser D., Holman R. R., Ioannidis J. P., Jankowski J., Julian B. A., Klein J. B., Kolch W., Luider T., Massy Z., Mattes W. B., Molina F., Monsarrat B., Novak J., Peter K., Rossing P., Sanchez-Carbayo M., Schanstra J. P., Semmes O. J., Spasovski G., Theodorescu D., Thongboonkerd V., Vanholder R., Veenstra T. D., Weissinger E., Yamamoto T., Vlahou A. (2010) Recommendations for biomarker identification and qualification in clinical proteomics. Sci. Transl. Med. 2, 46ps42 - PubMed
    1. Puntmann V. O. (2009) How-to guide on biomarkers: biomarker definitions, validation and applications with examples from cardiovascular disease. Postgrad. Med. J. 85, 538–545 - PubMed
    1. Rifai N., Gillette M. A., Carr S. A. (2006) Protein biomarker discovery and validation: the long and uncertain path to clinical utility. Nat. Biotechnol. 24, 971–983 - PubMed
    1. Hoekman B., Breitling R., Suits F., Bischoff R., Horvatovich P. (2012) msCompare: a framework for quantitative analysis of label-free LC-MS data for comparative candidate biomarker studies. Mol. Cell. Proteomics 11, M111.015974. - PMC - PubMed
    1. Saeys Y., Inza I., Larranaga P. (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 - PubMed

Publication types