The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures
- PMID: 22205940
- PMCID: PMC3244389
- DOI: 10.1371/journal.pone.0028210
The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures
Abstract
Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in biology and medicine. It is also extremely challenging from a statistical viewpoint, but surprisingly few studies have investigated the relative strengths and weaknesses of the plethora of existing feature selection methods. In this study we compare 32 feature selection methods on 4 public gene expression datasets for breast cancer prognosis, in terms of predictive performance, stability and functional interpretability of the signatures they produce. We observe that the feature selection method has a significant influence on the accuracy, stability and interpretability of signatures. Surprisingly, complex wrapper and embedded methods generally do not outperform simple univariate feature selection methods, and ensemble feature selection has generally no positive effect. Overall a simple Student's t-test seems to provide the best results.
Conflict of interest statement
Figures



















Similar articles
-
An experimental comparison of feature selection methods on two-class biomedical datasets.Comput Biol Med. 2015 Nov 1;66:1-10. doi: 10.1016/j.compbiomed.2015.08.010. Epub 2015 Aug 24. Comput Biol Med. 2015. PMID: 26327447
-
Robust biomarker identification for cancer diagnosis with ensemble feature selection methods.Bioinformatics. 2010 Feb 1;26(3):392-8. doi: 10.1093/bioinformatics/btp630. Epub 2009 Nov 25. Bioinformatics. 2010. PMID: 19942583
-
A novel feature selection approach for biomedical data classification.J Biomed Inform. 2010 Feb;43(1):15-23. doi: 10.1016/j.jbi.2009.07.008. Epub 2009 Jul 30. J Biomed Inform. 2010. PMID: 19647098
-
Research Techniques Made Simple: Feature Selection for Biomarker Discovery.J Invest Dermatol. 2019 Oct;139(10):2068-2074.e1. doi: 10.1016/j.jid.2019.07.682. J Invest Dermatol. 2019. PMID: 31543209 Review.
-
Network-Assisted Disease Classification and Biomarker Discovery.Methods Mol Biol. 2016;1386:353-74. doi: 10.1007/978-1-4939-3283-2_16. Methods Mol Biol. 2016. PMID: 26677191 Review.
Cited by
-
T-ReCS: stable selection of dynamically formed groups of features with application to prediction of clinical outcomes.Pac Symp Biocomput. 2015;20:431-42. Pac Symp Biocomput. 2015. PMID: 25592602 Free PMC article.
-
A Fusion-Based Machine Learning Approach for Autism Detection in Young Children Using Magnetoencephalography Signals.J Autism Dev Disord. 2023 Dec;53(12):4830-4848. doi: 10.1007/s10803-022-05767-w. Epub 2022 Oct 3. J Autism Dev Disord. 2023. PMID: 36192669 Free PMC article.
-
Benchmark study of feature selection strategies for multi-omics data.BMC Bioinformatics. 2022 Oct 5;23(1):412. doi: 10.1186/s12859-022-04962-x. BMC Bioinformatics. 2022. PMID: 36199022 Free PMC article.
-
Feature and decision-level fusion for schizophrenia detection based on resting-state fMRI data.PLoS One. 2022 May 24;17(5):e0265300. doi: 10.1371/journal.pone.0265300. eCollection 2022. PLoS One. 2022. PMID: 35609033 Free PMC article.
-
Applying a GAN-based classifier to improve transcriptome-based prognostication in breast cancer.PLoS Comput Biol. 2023 Apr 3;19(4):e1011035. doi: 10.1371/journal.pcbi.1011035. eCollection 2023 Apr. PLoS Comput Biol. 2023. PMID: 37011102 Free PMC article.
References
-
- Sotiriou C, Pusztai L. Gene-expression signatures in breast cancer. N Engl J Med. 2009;360:790–800. - PubMed
-
- Ioannidis JPA. Microarrays and molecular research: noise discovery? Lancet. 2005;365:454. - PubMed
-
- Ein-Dor L, Kela I, Getz G, Givol D, Domany E. Outcome signature genes in breast cancer: is there a unique set? Bioinformatics. 2005;21:171–178. - PubMed
-
- Michiels S, Koscielny S, Hill C. Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet. 2005;365:488–492. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources