. 2008:21:200-11.

Improving gene expression cancer molecular pattern discovery using nonnegative principal component analysis

Xiaoxu Han¹

Affiliations

PMID: 19425159

Free article

Improving gene expression cancer molecular pattern discovery using nonnegative principal component analysis

Xiaoxu Han. Genome Inform. 2008.

Free article

. 2008:21:200-11.

Author

Xiaoxu Han¹

Affiliation

¹ Department of Mathematics and Bioinformatics Program, Eastern Michigan University, Ypsilanti, MI 48197, USA. xiaoxu.han@emich.edu

PMID: 19425159

Abstract

Robust cancer molecular pattern identification from microarray data not only plays an essential role in modern clinic oncology, but also presents a challenge for statistical learning. Although principal component analysis (PCA) is a widely used feature selection algorithm in microarray analysis, its holistic mechanism prevents it from capturing the latent local data structure in the following cancer molecular pattern identification. In this study, we investigate the benefit of enforcing non-negativity constraints on principal component analysis (PCA) and propose a nonnegative principal component (NPCA) based classification algorithm in cancer molecular pattern analysis for gene expression data. This novel algorithm conducts classification by classifying meta-samples of input cancer data by support vector machines (SVM) or other classic supervised learning algorithms. The meta-samples are low-dimensional projections of original cancer samples in a purely additive meta-gene subspace generated from the NPCA-induced nonnegative matrix factorization (NMF). We report strongly leading classification results from NPCA-SVM algorithm in the cancer molecular pattern identification for five benchmark gene expression datasets under 100 trials of 50% hold-out cross validations and leave one out cross validations. We demonstrate superiority of NPCA-SVM algorithm by direct comparison with seven classification algorithms: SVM, PCA-SVM, KPCA-SVM, NMF-SVM, LLE-SVM, PCA-LDA and k-NN, for the five cancer datasets in classification rates, sensitivities and specificities. Our NPCA-SVM algorithm overcomes the over-fitting problem associative with SVM-based classifications for gene expression data under a Gaussian kernel. As a more robust high-performance classifier, NPCA-SVM can be used to replace the general SVM and k-NN classifiers in cancer biomarker discovery to capture more meaningful oncogenes.

PubMed Disclaimer

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- World Scientific Publishing Company
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Improving gene expression cancer molecular pattern discovery using nonnegative principal component analysis

Affiliation

Improving gene expression cancer molecular pattern discovery using nonnegative principal component analysis

Author

Affiliation

Abstract

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials