Incremental forward feature selection with application to microarray gene expression data
- PMID: 18781519
- DOI: 10.1080/10543400802277868
Incremental forward feature selection with application to microarray gene expression data
Abstract
In this study, the authors propose a new feature selection scheme, the incremental forward feature selection, which is inspired by incremental reduced support vector machines. In their method, a new feature is added into the current selected feature subset if it will bring in the most extra information. This information is measured by using the distance between the new feature vector and the column space spanned by current feature subset. The incremental forward feature selection scheme can exclude highly linear correlated features that provide redundant information and might degrade the efficiency of learning algorithms. The method is compared with the weight score approach and the 1-norm support vector machine on two well-known microarray gene expression data sets, the acute leukemia and colon cancer data sets. These two data sets have a very few observations but huge number of genes. The linear smooth support vector machine was applied to the feature subsets selected by these three schemes respectively and obtained a slightly better classification results in the 1-norm support vector machine and incremental forward feature selection. Finally, the authors claim that the rest of genes still contain some useful information. The previous selected features are iteratively removed from the data sets and the feature selection and classification steps are repeated for four rounds. The results show that there are many distinct feature subsets that can provide enough information for classification tasks in these two microarray gene expression data sets.
Similar articles
-
A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue.Artif Intell Med. 2007 Oct;41(2):161-75. doi: 10.1016/j.artmed.2007.07.008. Epub 2007 Sep 11. Artif Intell Med. 2007. PMID: 17851055
-
Gene selection from microarray data for cancer classification--a machine learning approach.Comput Biol Chem. 2005 Feb;29(1):37-46. doi: 10.1016/j.compbiolchem.2004.11.001. Comput Biol Chem. 2005. PMID: 15680584
-
What should be expected from feature selection in small-sample settings.Bioinformatics. 2006 Oct 1;22(19):2430-6. doi: 10.1093/bioinformatics/btl407. Epub 2006 Jul 26. Bioinformatics. 2006. PMID: 16870934
-
Filter versus wrapper gene selection approaches in DNA microarray domains.Artif Intell Med. 2004 Jun;31(2):91-103. doi: 10.1016/j.artmed.2004.01.007. Artif Intell Med. 2004. PMID: 15219288 Review.
-
Classification of breast cancer using microarray gene expression data: A survey.J Biomed Inform. 2021 May;117:103764. doi: 10.1016/j.jbi.2021.103764. Epub 2021 Apr 6. J Biomed Inform. 2021. PMID: 33831535 Review.
Cited by
-
Lung cancer gene expression database analysis incorporating prior knowledge with support vector machine-based classification method.J Exp Clin Cancer Res. 2009 Jul 18;28(1):103. doi: 10.1186/1756-9966-28-103. J Exp Clin Cancer Res. 2009. PMID: 19615083 Free PMC article.
-
A bibliometric and visual analysis of publications on artificial intelligence in colorectal cancer (2002-2022).Front Oncol. 2023 Feb 7;13:1077539. doi: 10.3389/fonc.2023.1077539. eCollection 2023. Front Oncol. 2023. PMID: 36824138 Free PMC article.
-
A new regularized least squares support vector regression for gene selection.BMC Bioinformatics. 2009 Feb 3;10:44. doi: 10.1186/1471-2105-10-44. BMC Bioinformatics. 2009. PMID: 19187562 Free PMC article.
-
The role of electrostatic energy in prediction of obligate protein-protein interactions.Proteome Sci. 2013 Nov 7;11(Suppl 1):S11. doi: 10.1186/1477-5956-11-S1-S11. Epub 2013 Nov 7. Proteome Sci. 2013. PMID: 24564955 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources