. 2013 Mar 24:14:107.

doi: 10.1186/1471-2105-14-107.

Non-negative matrix factorization by maximizing correntropy for cancer clustering

Jim Jing-Yan Wang¹, Xiaolei Wang, Xin Gao

Affiliations

Affiliation

¹ Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.

PMID: 23522344
PMCID: PMC3659102
DOI: 10.1186/1471-2105-14-107

Non-negative matrix factorization by maximizing correntropy for cancer clustering

Jim Jing-Yan Wang et al. BMC Bioinformatics. 2013.

. 2013 Mar 24:14:107.

doi: 10.1186/1471-2105-14-107.

Authors

Jim Jing-Yan Wang¹, Xiaolei Wang, Xin Gao

Affiliation

¹ Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.

PMID: 23522344
PMCID: PMC3659102
DOI: 10.1186/1471-2105-14-107

Abstract

Background: Non-negative matrix factorization (NMF) has been shown to be a powerful tool for clustering gene expression data, which are widely used to classify cancers. NMF aims to find two non-negative matrices whose product closely approximates the original matrix. Traditional NMF methods minimize either the l2 norm or the Kullback-Leibler distance between the product of the two matrices and the original matrix. Correntropy was recently shown to be an effective similarity measurement due to its stability to outliers or noise.

Results: We propose a maximum correntropy criterion (MCC)-based NMF method (NMF-MCC) for gene expression data-based cancer clustering. Instead of minimizing the l2 norm or the Kullback-Leibler distance, NMF-MCC maximizes the correntropy between the product of the two matrices and the original matrix. The optimization problem can be solved by an expectation conditional maximization algorithm.

Conclusions: Extensive experiments on six cancer benchmark sets demonstrate that the proposed method is significantly more accurate than the state-of-the-art methods in cancer clustering.

PubMed Disclaimer

Figures

**Figure 1**
**The** l₂**norm distance-based non-negative matrix factorization on the SRBCT dataset [29].** The gene expression data matrix, X, is factorized as the product of the meta-sample matrix, H, and the coding matrix, W.

**Figure 2**
Outline of the ECM-based NMF-MCC algorithm.

**Figure 3**
The boxplots of the clustering accuracies for NMF with different loss functions over 100 runs on the six gene expression datasets: (a) Leukemia, (b) Brain Tumor, (c) Lung Cancer, (d) 9 Tumors, (e) SRBCT, (f) DLBCL.

**Figure 4**
The boxplots of the clustering accuracies for different versions of NMF algorithms over 100 runs on the six gene expression datasets: (a) Leukemia, (b) Brain Tumor, (c) Lung Cancer, (d) 9 Tumors, (e) SRBCT, (f) DLBCL.

**Figure 5**
**The gene weight vector learned by NMF-MCC with** −ρ **on the SRBCT dataset.**

**Figure 6**
**The meta-sample matrix,** H, **weighted by** ***dig(−***ρ) and the corresponding coding matrix,W, obtained from the NMF-MCC algorithm for the SRBCT dataset.

See this image and copyright information in PMC

References

1. Shi F, Leckie C, MacIntyre G, Haviv I, Boussioutas A, Kowalczyk A. A bi-ordering approach to linking gene expression with clinical annotations in gastric cancer. BMC Bioinformatics. 2010;11:477. doi: 10.1186/1471-2105-11-477. - DOI - PMC - PubMed
1. de Souto MCP, Costa IG, de Araujo DSA, Ludermir TB, Schliep A. Clustering cancer gene expression data: a comparative study. BMC Bioinformatics. 2008;9:497. doi: 10.1186/1471-2105-9-497. - DOI - PMC - PubMed
1. Gao Y, Church G. Improving molecular cancer class discovery through sparse non-negative matrix factorization. Bioinformatics. 2005;21(21):3970—3975. - PubMed
1. Liu W, Yuan K, Ye D. On alpha-divergence based nonnegative matrix factorization for clustering cancer gene expression data. Artif Intell Med. 2008;44(1):1–5. doi: 10.1016/j.artmed.2008.05.001. - DOI - PubMed
1. Zheng CH, Ng TY, Zhang L, Shiu CK, Wang HQ. Tumor classification based on non-negative matrix factorization using gene expression data. IEEE Trans Nanobioscience. 2011;10(2):86–93. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Non-negative matrix factorization by maximizing correntropy for cancer clustering

Affiliation

Non-negative matrix factorization by maximizing correntropy for cancer clustering

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases