Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jul 1;27(13):i401-9.
doi: 10.1093/bioinformatics/btr206.

A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNA-gene regulatory modules

Affiliations

A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNA-gene regulatory modules

Shihua Zhang et al. Bioinformatics. .

Abstract

Motivation: It is well known that microRNAs (miRNAs) and genes work cooperatively to form the key part of gene regulatory networks. However, the specific functional roles of most miRNAs and their combinatorial effects in cellular processes are still unclear. The availability of multiple types of functional genomic data provides unprecedented opportunities to study the miRNA-gene regulation. A major challenge is how to integrate the diverse genomic data to identify the regulatory modules of miRNAs and genes.

Results: Here we propose an effective data integration framework to identify the miRNA-gene regulatory comodules. The miRNA and gene expression profiles are jointly analyzed in a multiple non-negative matrix factorization framework, and additional network data are simultaneously integrated in a regularized manner. Meanwhile, we employ the sparsity penalties to the variables to achieve modular solutions. The mathematical formulation can be effectively solved by an iterative multiplicative updating algorithm. We apply the proposed method to integrate a set of heterogeneous data sources including the expression profiles of miRNAs and genes on 385 human ovarian cancer samples, computationally predicted miRNA-gene interactions, and gene-gene interactions. We demonstrate that the miRNAs and genes in 69% of the regulatory comodules are significantly associated. Moreover, the comodules are significantly enriched in known functional sets such as miRNA clusters, GO biological processes and KEGG pathways, respectively. Furthermore, many miRNAs and genes in the comodules are related with various cancers including ovarian cancer. Finally, we show that comodules can stratify patients (samples) into groups with significant clinical characteristics.

Availability: The program and supplementary materials are available at http://zhoulab.usc.edu/SNMNMF/.

Contact: xjzhou@usc.edu; zsh@amss.ac.cn

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Overview of the proposed method for identifying miRNA-gene regulatory comodules. A miRNA-gene comodule is defined as the union of a set of miRNAs (a miRNA module) and a set of genes (a gene module). The inputs are (i) two sets of expression profiles (represented by the matrices X1 and X2) for miRNAs and genes, measured on the same set of samples; (ii) a gene–gene interaction network (represented by the matrix A), including protein–protein interactions and DNA–protein interactions; and (iii) a list of predicted miRNA–gene regulatory interactions (represented by the matrix B) based on sequence data. We simultaneously factor the miRNA and gene expression matrices into a common basis W and two coefficient matrices H1 and H2. At the same time, additional knowledge is incorporated into this framework with network-regularized constraints. Sparsity constraints are also imposed on this framework so as to obtain easily interpretable solutions. The decomposed matrix components provide information about miRNA-gene regulatory comodules. Then the comodules are identified based on shared components (a column in W) with significant association values in the corresponding rows of H1 and H2.
Fig. 2.
Fig. 2.
About 44.4% of the miRNAs in identified comodules have previously been reported to be cancer related (hypergeometric test, P=1.1×10−6). Of these, 21 miRNAs were specifically related to ovarian cancers (hypergeometric test, P=7.2×10−6).
Fig. 3.
Fig. 3.
Network analysis of comodule 40. (A) The highly connected network consists mainly of genes in comodule 40 (orange nodes), but also includes 6 genes identified using the IPA system (white nodes). Two miRNAs (miR-222, miR-99a, green nodes) are also shown. Based on the MicroCosm Targets V5.0 dataset, miR-222 targets two genes (solid line). Significant anti-correlations between miRNAs and genes are shown with dashed lines. (B) Anti-correlations between miR-222 and gene expression profiles (Pearson's correlation coefficients <−0.21, P-value <5.0×10−5).
Fig. 4.
Fig. 4.
Kaplan–Meier survival analysis for three patient groups defined using their signals in a column vector of W. The curves are plotted for comodules 39 (A) and 40 (B).

References

    1. Berry M., et al. Algorithms and applications for approximation nonnegative matrix factorization. Comput. Stat. Data Anal. 2007;52:155–173.
    1. Bartel D.P. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004;116:281–297. - PubMed
    1. Bentwich I., et al. Identification of hundreds of conserved and nonconserved human microRNAs. Nat. Genet. 2005;37:766–770. - PubMed
    1. Bossi A., Lehner B. Tissue specificity and the human protein interaction network. Mol. Syst. Biol. 2009;5:260. - PMC - PubMed
    1. Brunet J.P., et al. Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl Acad. Sci. USA. 2004;101:4164–4169. - PMC - PubMed

Publication types