Feature selection and classification of MAQC-II breast cancer and multiple myeloma microarray gene expression data
- PMID: 20011240
- PMCID: PMC2789385
- DOI: 10.1371/journal.pone.0008250
Feature selection and classification of MAQC-II breast cancer and multiple myeloma microarray gene expression data
Abstract
Microarray data has a high dimension of variables but available datasets usually have only a small number of samples, thereby making the study of such datasets interesting and challenging. In the task of analyzing microarray data for the purpose of, e.g., predicting gene-disease association, feature selection is very important because it provides a way to handle the high dimensionality by exploiting information redundancy induced by associations among genetic markers. Judicious feature selection in microarray data analysis can result in significant reduction of cost while maintaining or improving the classification or prediction accuracy of learning machines that are employed to sort out the datasets. In this paper, we propose a gene selection method called Recursive Feature Addition (RFA), which combines supervised learning and statistical similarity measures. We compare our method with the following gene selection methods: Support Vector Machine Recursive Feature Elimination (SVMRFE), Leave-One-Out Calculation Sequential Forward Selection (LOOCSFS), Gradient based Leave-one-out Gene Selection (GLGS). To evaluate the performance of these gene selection methods, we employ several popular learning classifiers on the MicroArray Quality Control phase II on predictive modeling (MAQC-II) breast cancer dataset and the MAQC-II multiple myeloma dataset. Experimental results show that gene selection is strictly paired with learning classifier. Overall, our approach outperforms other compared methods. The biological functional analysis based on the MAQC-II breast cancer dataset convinced us to apply our method for phenotype prediction. Additionally, learning classifiers also play important roles in the classification of microarray data and our experimental results indicate that the Nearest Mean Scale Classifier (NMSC) is a good choice due to its prediction reliability and its stability across the three performance measurements: Testing accuracy, MCC values, and AUC errors.
Conflict of interest statement
Figures







Similar articles
-
Gene selection and classification for cancer microarray data based on machine learning and similarity measures.BMC Genomics. 2011 Dec 23;12 Suppl 5(Suppl 5):S1. doi: 10.1186/1471-2164-12-S5-S1. Epub 2011 Dec 23. BMC Genomics. 2011. PMID: 22369383 Free PMC article.
-
Comparison of feature selection and classification for MALDI-MS data.BMC Genomics. 2009 Jul 7;10 Suppl 1(Suppl 1):S3. doi: 10.1186/1471-2164-10-S1-S3. BMC Genomics. 2009. PMID: 19594880 Free PMC article.
-
Recursive cluster elimination (RCE) for classification and feature selection from gene expression data.BMC Bioinformatics. 2007 May 2;8:144. doi: 10.1186/1471-2105-8-144. BMC Bioinformatics. 2007. PMID: 17474999 Free PMC article.
-
Filter versus wrapper gene selection approaches in DNA microarray domains.Artif Intell Med. 2004 Jun;31(2):91-103. doi: 10.1016/j.artmed.2004.01.007. Artif Intell Med. 2004. PMID: 15219288 Review.
-
A hybrid machine learning feature selection model-HMLFSM to enhance gene classification applied to multiple colon cancers dataset.PLoS One. 2023 Nov 2;18(11):e0286791. doi: 10.1371/journal.pone.0286791. eCollection 2023. PLoS One. 2023. PMID: 37917732 Free PMC article. Review.
Cited by
-
Establishment of a monoclonal antibody against a peptide of the novel zinc finger protein ZNF32 proved to be specific and sensitive for immunological measurements.Med Sci Monit. 2012 May;18(5):BR167-73. doi: 10.12659/msm.882725. Med Sci Monit. 2012. PMID: 22534698 Free PMC article.
-
Maximizing biomarker discovery by minimizing gene signatures.BMC Genomics. 2011 Dec 23;12 Suppl 5(Suppl 5):S6. doi: 10.1186/1471-2164-12-S5-S6. Epub 2011 Dec 23. BMC Genomics. 2011. PMID: 22369133 Free PMC article.
-
A consensus multi-view multi-objective gene selection approach for improved sample classification.BMC Bioinformatics. 2020 Sep 17;21(Suppl 13):386. doi: 10.1186/s12859-020-03681-5. BMC Bioinformatics. 2020. PMID: 32938388 Free PMC article.
-
Comprehensive evaluation of composite gene features in cancer outcome prediction.Cancer Inform. 2015 Feb 24;13(Suppl 3):93-104. doi: 10.4137/CIN.S14028. eCollection 2014. Cancer Inform. 2015. PMID: 25780335 Free PMC article.
-
Identifying Significant Features in Cancer Methylation Data Using Gene Pathway Segmentation.Cancer Inform. 2016 Sep 20;15:189-98. doi: 10.4137/CIN.S39859. eCollection 2016. Cancer Inform. 2016. PMID: 27688706 Free PMC article.
References
-
- Chen Z, McGee M, Liu Q, Scheuermann RH. A distribution free summarization method for affymetrix genechip arrays. Bioinformatics. 2007;23(3):321–327. - PubMed
-
- Qin Z. Clustering microarray gene expression data using weighted Chinese restaurant process. Bioinformatics. 2006;22(16):1988–1997. - PubMed
-
- Quackenbush J. Computational analysis of microarray data. Nature Rev Genetic. 2001;2:418–427. - PubMed
-
- Segal E, Friedman N, Kaminski N, Regev A, Koller D. From signatures to models: understanding cancer using microarrays. Nature Genetics. 2005;37:S38–45. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical