Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Oct 14;11(10):e0164880.
doi: 10.1371/journal.pone.0164880. eCollection 2016.

Impact of the Choice of Normalization Method on Molecular Cancer Class Discovery Using Nonnegative Matrix Factorization

Affiliations

Impact of the Choice of Normalization Method on Molecular Cancer Class Discovery Using Nonnegative Matrix Factorization

Haixuan Yang et al. PLoS One. .

Abstract

Nonnegative Matrix Factorization (NMF) has proved to be an effective method for unsupervised clustering analysis of gene expression data. By the nonnegativity constraint, NMF provides a decomposition of the data matrix into two matrices that have been used for clustering analysis. However, the decomposition is not unique. This allows different clustering results to be obtained, resulting in different interpretations of the decomposition. To alleviate this problem, some existing methods directly enforce uniqueness to some extent by adding regularization terms in the NMF objective function. Alternatively, various normalization methods have been applied to the factor matrices; however, the effects of the choice of normalization have not been carefully investigated. Here we investigate the performance of NMF for the task of cancer class discovery, under a wide range of normalization choices. After extensive evaluations, we observe that the maximum norm showed the best performance, although the maximum norm has not previously been used for NMF. Matlab codes are freely available from: http://maths.nuigalway.ie/~haixuanyang/pNMF/pNMF.htm.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Clustering errors as a function of the number of features (genes) for datasets Leukemia (k = 2), Leukemia (k = 3), CNS (k = 4) and Medulloblastoma (k = 2) respectively.
On each of these datasets, the Basic NMF (and the post-processing method using the maximum norm together with the filter) is run on subsets of the full data with 1000 + 100d of most highly varying genes (d = 0, 1, 2, 3, …). Results are shown as continuous lines for clarity. Clustering error, the number of samples improperly clustered by an algorithm. Here the Basic NMF is the one minimizing a KL divergence in Eq (1).
Fig 2
Fig 2. Accuracy as a function of noise levels for datasets Leukemia (k = 2), Leukemia (k = 3), CNS (k = 4) and Medulloblastoma (k = 2) respectively.
For each noise level μ, NMFs are run 100 times on disturbed matrices. On each of such runs, a disturbed matrix A′ is generated by adding independent uniform noises: Ai,j=Aij+μ*rij, where rij is a random number generated by a uniform distribution on the interval [0, max], and max is the maximum expression in A. Plotted is the mean of clustering accuracies from 100 runs together with an error bar representing a standard error of the mean. The post-processing method uses the maximum norm together with the filter.
Fig 3
Fig 3. Cophenetic correlation.
(a) Leukemia. (b) CNS. (c) Medulloblastoma.

References

    1. Golub TR et al. Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science. 1999;286(5439):531–537. 10.1126/science.286.5439.531 - DOI - PubMed
    1. Brunet JP et al. Metagenes and molecular pattern discovery using matrix factorization. Proceedings of the National Academy of Sciences. 2004;101(12):4164–4169. 10.1073/pnas.0308531101 - DOI - PMC - PubMed
    1. Lee DD, Seung HS. Learning the parts of objects by nonnegative matrix factorization. Nature. 1999;401:788–791. 10.1038/44565 - DOI - PubMed
    1. Lee DD, Seung HS. Algorithms for Non-negative Matrix Factorization. In: Advances in Neural Information Processing Systems 13; 2001. p. 556–562.
    1. Gao Y, Church G. Improving molecular cancer class discovery through sparse non-negative matrix factorization. Bioinformatics. 2005;21(21):3970–3975. 10.1093/bioinformatics/bti653 - DOI - PubMed

LinkOut - more resources