Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan 1;32(1):1-8.
doi: 10.1093/bioinformatics/btv544. Epub 2015 Sep 15.

A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data

Affiliations

A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data

Zi Yang et al. Bioinformatics. .

Abstract

Motivation: Recent advances in high-throughput omics technologies have enabled biomedical researchers to collect large-scale genomic data. As a consequence, there has been growing interest in developing methods to integrate such data to obtain deeper insights regarding the underlying biological system. A key challenge for integrative studies is the heterogeneity present in the different omics data sources, which makes it difficult to discern the coordinated signal of interest from source-specific noise or extraneous effects.

Results: We introduce a novel method of multi-modal data analysis that is designed for heterogeneous data based on non-negative matrix factorization. We provide an algorithm for jointly decomposing the data matrices involved that also includes a sparsity option for high-dimensional settings. The performance of the proposed method is evaluated on synthetic data and on real DNA methylation, gene expression and miRNA expression data from ovarian cancer samples obtained from The Cancer Genome Atlas. The results show the presence of common modules across patient samples linked to cancer-related pathways, as well as previously established ovarian cancer subtypes.

Availability and implementation: The source code repository is publicly available at https://github.com/yangzi4/iNMF.

Contact: gmichail@umich.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
(a) An example of multi-dimensional modules across three different data sources. Three modules are distinguishable in Scenario 1 as strong associations between subsets of variables across sources and a common subset of observations. Scenario 2 contains the same data with added random noise and confounding effects. (b) Low-dimensional representations of the data (X2), jNMF approximations (W) and iNMF approximations (W). The modules are clearly detected by both methods in Scenario 1 but only by iNMF in Scenario 2 (Color version of this figure is available at Bioinformatics online.)
Fig. 2.
Fig. 2.
Average ratios (iNMF:jNMF) of detection performance (S) over 25 trials (with standard errors) under four data and module dimensions, with three types of perturbations (uniform, scattered, heterogeneous). The leftmost common point in each subplot represents the error scenario σu=σs=σh=0.01, while each trajectory represents raising the level of a single type of error. (a) Two sources of 40 × 40, four modules of 8 × 8; (b) two sources of 80 × 80, eight modules of 8 × 8; (c) two sources of 72 × 72, four modules of 16 × 16 and (d) four sources of 40 × 40, four modules of 8 × 8 (Color version of this figure is available at Bioinformatics online.)
Fig. 3.
Fig. 3.
Module memberships of genes (from iNMF) arranged according to pathways derived from BioCarta and relevant literature and include processes of DNA repair (top right), cell cycle regulation (bottom), cell survival and proliferation (left) and cell migration (top left) (Color version of this figure is available at Bioinformatics online.)

References

    1. Banerjee A., et al. (2005) Clustering with Bregman divergences. J. Mach. Learn. Res., 6, 1705–1749.
    1. Bell D., et al. (2011) Integrated genomic analyses of ovarian carcinoma. Nature, 474, 609–615. - PMC - PubMed
    1. Berry M.W., et al. (2007) Algorithms and applications for approximate nonnegative matrix factorization. Comput. Stat. Data Anal., 52, 155–173.
    1. Brunet J.P., et al. (2004) Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl. Acad. Sci. USA, 101, 4164–4169. - PMC - PubMed
    1. Chalhoub N., Baker S.J. (2009) PTEN and the PI3-kinase pathway in cancer. Annu. Rev. Pathol., 4, 127–150. - PMC - PubMed

Publication types