Robust biomarker identification for cancer diagnosis with ensemble feature selection methods
- PMID: 19942583
- DOI: 10.1093/bioinformatics/btp630
Robust biomarker identification for cancer diagnosis with ensemble feature selection methods
Abstract
Motivation: Biomarker discovery is an important topic in biomedical applications of computational biology, including applications such as gene and SNP selection from high-dimensional data. Surprisingly, the stability with respect to sampling variation or robustness of such selection processes has received attention only recently. However, robustness of biomarkers is an important issue, as it may greatly influence subsequent biological validations. In addition, a more robust set of markers may strengthen the confidence of an expert in the results of a selection method.
Results: Our first contribution is a general framework for the analysis of the robustness of a biomarker selection algorithm. Secondly, we conducted a large-scale analysis of the recently introduced concept of ensemble feature selection, where multiple feature selections are combined in order to increase the robustness of the final set of selected features. We focus on selection methods that are embedded in the estimation of support vector machines (SVMs). SVMs are powerful classification models that have shown state-of-the-art performance on several diagnosis and prognosis tasks on biological data. Their feature selection extensions also offered good results for gene selection tasks. We show that the robustness of SVMs for biomarker discovery can be substantially increased by using ensemble feature selection techniques, while at the same time improving upon classification performances. The proposed methodology is evaluated on four microarray datasets showing increases of up to almost 30% in robustness of the selected biomarkers, along with an improvement of approximately 15% in classification performance. The stability improvement with ensemble methods is particularly noticeable for small signature sizes (a few tens of genes), which is most relevant for the design of a diagnosis or prognosis model from a gene signature.
Supplementary information: Supplementary data are available at Bioinformatics online.
Similar articles
-
Robust and efficient identification of biomarkers by classifying features on graphs.Bioinformatics. 2008 Sep 15;24(18):2023-9. doi: 10.1093/bioinformatics/btn383. Epub 2008 Jul 24. Bioinformatics. 2008. PMID: 18653521
-
Gene selection via the BAHSIC family of algorithms.Bioinformatics. 2007 Jul 1;23(13):i490-8. doi: 10.1093/bioinformatics/btm216. Bioinformatics. 2007. PMID: 17646335
-
Gene selection in cancer classification using sparse logistic regression with Bayesian regularization.Bioinformatics. 2006 Oct 1;22(19):2348-55. doi: 10.1093/bioinformatics/btl386. Epub 2006 Jul 14. Bioinformatics. 2006. PMID: 16844704
-
Functional genomics and proteomics in the clinical neurosciences: data mining and bioinformatics.Prog Brain Res. 2006;158:83-108. doi: 10.1016/S0079-6123(06)58004-5. Prog Brain Res. 2006. PMID: 17027692 Review.
-
Stable feature selection for biomarker discovery.Comput Biol Chem. 2010 Aug;34(4):215-25. doi: 10.1016/j.compbiolchem.2010.07.002. Epub 2010 Aug 10. Comput Biol Chem. 2010. PMID: 20702140 Review.
Cited by
-
Machine Learning-Based Ensemble Recursive Feature Selection of Circulating miRNAs for Cancer Tumor Classification.Cancers (Basel). 2020 Jul 3;12(7):1785. doi: 10.3390/cancers12071785. Cancers (Basel). 2020. PMID: 32635415 Free PMC article.
-
Quantification of liver fibrosis via second harmonic imaging of the Glisson's capsule from liver surface.J Biophotonics. 2016 Apr;9(4):351-63. doi: 10.1002/jbio.201500001. Epub 2015 Jul 1. J Biophotonics. 2016. PMID: 26131709 Free PMC article.
-
ReGeNNe: genetic pathway-based deep neural network using canonical correlation regularizer for disease prediction.Bioinformatics. 2023 Nov 1;39(11):btad679. doi: 10.1093/bioinformatics/btad679. Bioinformatics. 2023. PMID: 37963055 Free PMC article.
-
ellipsoidFN: a tool for identifying a heterogeneous set of cancer biomarkers based on gene expressions.Nucleic Acids Res. 2013 Feb 1;41(4):e53. doi: 10.1093/nar/gks1288. Epub 2012 Dec 22. Nucleic Acids Res. 2013. PMID: 23262226 Free PMC article.
-
A bioinformatics approach to identify patients with symptomatic peanut allergy using peptide microarray immunoassay.J Allergy Clin Immunol. 2012 May;129(5):1321-1328.e5. doi: 10.1016/j.jaci.2012.02.012. Epub 2012 Mar 23. J Allergy Clin Immunol. 2012. PMID: 22444503 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources