Nonnegative principal component analysis for cancer molecular pattern discovery
- PMID: 20671323
- DOI: 10.1109/TCBB.2009.36
Nonnegative principal component analysis for cancer molecular pattern discovery
Abstract
As a well-established feature selection algorithm, principal component analysis (PCA) is often combined with the state-of-the-art classification algorithms to identify cancer molecular patterns in microarray data. However, the algorithm's global feature selection mechanism prevents it from effectively capturing the latent data structures in the high-dimensional data. In this study, we investigate the benefit of adding nonnegative constraints on PCA and develop a nonnegative principal component analysis algorithm (NPCA) to overcome the global nature of PCA. A novel classification algorithm NPCA-SVM is proposed for microarray data pattern discovery. We report strong classification results from the NPCA-SVM algorithm on five benchmark microarray data sets by direct comparison with other related algorithms. We have also proved mathematically and interpreted biologically that microarray data will inevitably encounter overfitting for an SVM/PCA-SVM learning machine under a Gaussian kernel. In addition, we demonstrate that nonnegative principal component analysis can be used to capture meaningful biomarkers effectively.
Similar articles
-
Nonnegative principal component analysis for mass spectral serum profiles and biomarker discovery.BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2105-11-S1-S1. BMC Bioinformatics. 2010. PMID: 20122180 Free PMC article.
-
Improving gene expression cancer molecular pattern discovery using nonnegative principal component analysis.Genome Inform. 2008;21:200-11. Genome Inform. 2008. PMID: 19425159
-
Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery.BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S7. doi: 10.1186/1471-2105-12-S1-S7. BMC Bioinformatics. 2011. PMID: 21342590 Free PMC article.
-
A review of independent component analysis application to microarray gene expression data.Biotechniques. 2008 Nov;45(5):501-20. doi: 10.2144/000112950. Biotechniques. 2008. PMID: 19007336 Free PMC article. Review.
-
A review of feature extraction software for microarray gene expression data.Biomed Res Int. 2014;2014:213656. doi: 10.1155/2014/213656. Epub 2014 Aug 31. Biomed Res Int. 2014. PMID: 25250315 Free PMC article. Review.
Cited by
-
Disease Biomarker Query from RNA-Seq Data.Cancer Inform. 2014 Oct 14;13(Suppl 1):81-94. doi: 10.4137/CIN.S13876. eCollection 2014. Cancer Inform. 2014. PMID: 25392686 Free PMC article.
-
Learning a weighted meta-sample based parameter free sparse representation classification for microarray data.PLoS One. 2014 Aug 12;9(8):e104314. doi: 10.1371/journal.pone.0104314. eCollection 2014. PLoS One. 2014. PMID: 25115965 Free PMC article.
-
Nonnegative principal component analysis for mass spectral serum profiles and biomarker discovery.BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2105-11-S1-S1. BMC Bioinformatics. 2010. PMID: 20122180 Free PMC article.
-
MSPJ: Discovering potential biomarkers in small gene expression datasets via ensemble learning.Comput Struct Biotechnol J. 2022 Jul 14;20:3783-3795. doi: 10.1016/j.csbj.2022.07.022. eCollection 2022. Comput Struct Biotechnol J. 2022. PMID: 35891786 Free PMC article.
-
Overcome support vector machine diagnosis overfitting.Cancer Inform. 2014 Dec 9;13(Suppl 1):145-58. doi: 10.4137/CIN.S13875. eCollection 2014. Cancer Inform. 2014. PMID: 25574125 Free PMC article. Review.
MeSH terms
LinkOut - more resources
Full Text Sources
Research Materials