Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Mar 24:14:107.
doi: 10.1186/1471-2105-14-107.

Non-negative matrix factorization by maximizing correntropy for cancer clustering

Affiliations

Non-negative matrix factorization by maximizing correntropy for cancer clustering

Jim Jing-Yan Wang et al. BMC Bioinformatics. .

Abstract

Background: Non-negative matrix factorization (NMF) has been shown to be a powerful tool for clustering gene expression data, which are widely used to classify cancers. NMF aims to find two non-negative matrices whose product closely approximates the original matrix. Traditional NMF methods minimize either the l2 norm or the Kullback-Leibler distance between the product of the two matrices and the original matrix. Correntropy was recently shown to be an effective similarity measurement due to its stability to outliers or noise.

Results: We propose a maximum correntropy criterion (MCC)-based NMF method (NMF-MCC) for gene expression data-based cancer clustering. Instead of minimizing the l2 norm or the Kullback-Leibler distance, NMF-MCC maximizes the correntropy between the product of the two matrices and the original matrix. The optimization problem can be solved by an expectation conditional maximization algorithm.

Conclusions: Extensive experiments on six cancer benchmark sets demonstrate that the proposed method is significantly more accurate than the state-of-the-art methods in cancer clustering.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The l2 norm distance-based non-negative matrix factorization on the SRBCT dataset [29]. The gene expression data matrix, X, is factorized as the product of the meta-sample matrix, H, and the coding matrix, W.
Figure 2
Figure 2
Outline of the ECM-based NMF-MCC algorithm.
Figure 3
Figure 3
The boxplots of the clustering accuracies for NMF with different loss functions over 100 runs on the six gene expression datasets: (a) Leukemia, (b) Brain Tumor, (c) Lung Cancer, (d) 9 Tumors, (e) SRBCT, (f) DLBCL.
Figure 4
Figure 4
The boxplots of the clustering accuracies for different versions of NMF algorithms over 100 runs on the six gene expression datasets: (a) Leukemia, (b) Brain Tumor, (c) Lung Cancer, (d) 9 Tumors, (e) SRBCT, (f) DLBCL.
Figure 5
Figure 5
The gene weight vector learned by NMF-MCC with ρ on the SRBCT dataset.
Figure 6
Figure 6
The meta-sample matrix, H, weighted by dig(−ρ) and the corresponding coding matrix,W, obtained from the NMF-MCC algorithm for the SRBCT dataset.

References

    1. Shi F, Leckie C, MacIntyre G, Haviv I, Boussioutas A, Kowalczyk A. A bi-ordering approach to linking gene expression with clinical annotations in gastric cancer. BMC Bioinformatics. 2010;11:477. doi: 10.1186/1471-2105-11-477. - DOI - PMC - PubMed
    1. de Souto MCP, Costa IG, de Araujo DSA, Ludermir TB, Schliep A. Clustering cancer gene expression data: a comparative study. BMC Bioinformatics. 2008;9:497. doi: 10.1186/1471-2105-9-497. - DOI - PMC - PubMed
    1. Gao Y, Church G. Improving molecular cancer class discovery through sparse non-negative matrix factorization. Bioinformatics. 2005;21(21):3970—3975. - PubMed
    1. Liu W, Yuan K, Ye D. On alpha-divergence based nonnegative matrix factorization for clustering cancer gene expression data. Artif Intell Med. 2008;44(1):1–5. doi: 10.1016/j.artmed.2008.05.001. - DOI - PubMed
    1. Zheng CH, Ng TY, Zhang L, Shiu CK, Wang HQ. Tumor classification based on non-negative matrix factorization using gene expression data. IEEE Trans Nanobioscience. 2011;10(2):86–93. - PubMed

Publication types