Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jul 6;46(12):5967-5976.
doi: 10.1093/nar/gky440.

Discovery of two-level modular organization from matched genomic data via joint matrix tri-factorization

Affiliations

Discovery of two-level modular organization from matched genomic data via joint matrix tri-factorization

Jinyu Chen et al. Nucleic Acids Res. .

Abstract

With the rapid development of biotechnology, multi-dimensional genomic data are available for us to study the regulatory associations among multiple levels. Thus, it is essential to develop a tool to identify not only the modular patterns from multiple levels, but also the relationships among these modules. In this study, we adopt a novel non-negative matrix factorization framework (NetNMF) to integrate pairwise genomic data in a network manner. NetNMF could reveal the modules of each dimension and the connections within and between both types of modules. We first demonstrated the effectiveness of NetNMF using a set of simulated data and compared it with two typical NMF methods. Further, we applied it to two different types of pairwise genomic datasets including microRNA (miRNA) and gene expression data from The Cancer Genome Atlas and gene expression and pharmacological data from the Cancer Genome Project. We respectively identified a two-level miRNA-gene module network and a two-level gene-drug module network. Not only have the majority of identified modules significantly functional implications, but also the three types of module pairs have closely biological associations. This module discovery tool provides us comprehensive insights into the mechanisms of how the two levels of molecules cooperate with each other.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of the NetNMF for discovering a two-level module network by integrating pairwise genomic data. Three matrices R11, R12, R22 are computed via Pearson correlation, representing the similarities within and between two types of features in the pairwise input data matrices X1 and X2. NetNMF simultaneously decomposes R11, R12, R22 to get the underlying co-modules and their associations. The ith co-module is identified based on the ith column vector in factored matrices G1 and G2; the association degree between the ith and jth modules is determined by S11(i, j) (or S22(i, j)), where S(i, j) represents the element of the ith row and jth column in this matrix. Thus, a two-layer module network could be constructed in which a node represents a module.
Figure 2.
Figure 2.
Performance comparison of NetNMF, NMF and TriNMF in terms of purity as well as AUC in simulated datasets. (A and B) The boxplots of purity scores for identified co-modules in 30 realizations on the simulated data with respect to different noise levels. Different thresholds (T = 1 for (A) and T = 1.5 for (B)) are used for selecting features from both factored matrices G1 and G2. (C) The boxplots of AUC scores without any pre-defined parameters in the same 30 realizations.
Figure 3.
Figure 3.
Illustration of the two-layer module network using TCGA breast cancer dataset. (A) The miRNA–gene module network consists of 69 miRNA modules in the top layer, 69 gene modules in the bottom layer, 69 edges (dash lines with equal weights) of one-to-one matching miRNA–gene co-modules and 99 edges between gene modules and 88 edges between miRNA modules weighted by the corresponding values in factored matrices S11 and S22, respectively. gMx (or mMx) indicate a gene (or miRNA) module with index x. (B) The module 48-centered subnetwork. (C) The detailed network for each module in (B). Some pairs of miRNAs in one miRNA module are linked if the two miRNAs share at least one target. The gene network for one gene module is constructed based on GeneMANIA (41). (D) Heat map of co-module 48 consisting of 171 genes and 11 miRNAs (squared boxes) based on the input similarity matrices of NetNMF. We extended the heat map to cover more variables by randomly selecting 171 genes and 11 miRNAs for contrasting. (E and F) Top biological terms enriched in the gene modules (E) and miRNA modules (F) in (B). The enrichment ratio indicates the functional significance of a module with −log10 (P-value) (Bonferroni-corrected P-value). Similar setting is used in Figure 5.
Figure 4.
Figure 4.
Comparison of all the enriched GO BPs of modules detected by NetNMF, TriNMF and NMF using TCGA breast cancer dataset. NetNMF performs better than TriNMF and NMF via one-sided Wilcoxon signed rank tests. (A) Comparison for gene modules. For each GO BP, we compute enrichment scores (−log10(P-value)) and the highest score among all modules is taken as the final score of this GO BP for each method. The scores for NetNMF are plotted against those of TriNMF and NMF. Majority of terms are above the central diagonal line, that is, they are more significantly enriched using NetNMF than TriNMF (59%) or NMF (66%). (B) Comparison for the target gene set of miRNA modules. Similar setting with (A). About 60 and 57% of terms are above the central diagonal line comparing NetNMF with TriNMF and NMF, respectively.
Figure 5.
Figure 5.
Illustration of the two-layer module network using the CGP dataset. (A) This gene–drug module network includes 88 gene–drug co-modules, 122 edges between drug modules in the top layer and 113 edges between gene modules in the bottom layer. (B) The module 37-centered subnetwork. (C) Both drug module 37 and 84 contain only one drug; drug module 10 includes two drugs targeting the same pathway (F). (D) Heat map of co-module 37 consisting of 44 genes and one drug (squared boxes). (E) Top enriched biological terms in gene modules in (B). (F) The details about drug modules in (B).

References

    1. Barabasi A., Oltvai Z.. Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 2004; 5:101–113. - PubMed
    1. Zhang S., Jin G., Zhang X., Chen L.. Discovering functions and revealing mechanisms at molecular level from biological networks. Proteomics. 2007; 7:2856–2869. - PubMed
    1. Girvan M., Newman M.. Community structure in social and biological networks. Proc. Natl. Acad. Sci. U.S.A. 2002; 99:7821–7826. - PMC - PubMed
    1. Rhrissorrakrai K., Gunsalus K.. MINE: module identification in networks. BMC Bioinformatics. 2011; 12:192. - PMC - PubMed
    1. Becker E., Robisson B., Chapple C., Guénoche A., Brun C.. Multifunctional proteins revealed by overlapping clustering in protein interaction network. Bioinformatics. 2012; 28:84–90. - PMC - PubMed

Publication types

LinkOut - more resources