Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Nov 4;17(1):874.
doi: 10.1186/s12864-016-3198-9.

Mergeomics: multidimensional data integration to identify pathogenic perturbations to biological systems

Affiliations

Mergeomics: multidimensional data integration to identify pathogenic perturbations to biological systems

Le Shu et al. BMC Genomics. .

Abstract

Background: Complex diseases are characterized by multiple subtle perturbations to biological processes. New omics platforms can detect these perturbations, but translating the diverse molecular and statistical information into testable mechanistic hypotheses is challenging. Therefore, we set out to create a public tool that integrates these data across multiple datasets, platforms, study designs and species in order to detect the most promising targets for further mechanistic studies.

Results: We developed Mergeomics, a computational pipeline consisting of independent modules that 1) leverage multi-omics association data to identify biological processes that are perturbed in disease, and 2) overlay the disease-associated processes onto molecular interaction networks to pinpoint hubs as potential key regulators. Unlike existing tools that are mostly dedicated to specific data type or settings, the Mergeomics pipeline accepts and integrates datasets across platforms, data types and species. We optimized and evaluated the performance of Mergeomics using simulation and multiple independent datasets, and benchmarked the results against alternative methods. We also demonstrate the versatility of Mergeomics in two case studies that include genome-wide, epigenome-wide and transcriptome-wide datasets from human and mouse studies of total cholesterol and fasting glucose. In both cases, the Mergeomics pipeline provided statistical and contextual evidence to prioritize further investigations in the wet lab. The software implementation of Mergeomics is freely available as a Bioconductor R package.

Conclusion: Mergeomics is a flexible and robust computational pipeline for multidimensional data integration. It outperforms existing tools, and is easily applicable to datasets from different studies, species and omics data types for the study of complex traits.

Keywords: Blood glucose; Cholesterol; Functional genomics; Gene networks; Integrative genomics; Key drivers; Mergeomics; Multidimensional data integration.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Main modules, data flow between them and examples of data types that can be integrated by Mergeomics
Fig. 2
Fig. 2
Schematic illustration of the concept of a key driver gene (a) and local hubs with overlapping neighborhoods (b)
Fig. 3
Fig. 3
Comparison of three pathway enrichment methods across three GWAS. Performance is evaluated by sensitivity (a), specificity (b), positive likelihood ratio (sensitivity/(1-specificity)) (c) and receiver operating characteristic curve (df). Sensitivity was defined as the proportion of positive control pathways detected at FDR < 25 %. Specificity was defined as the proportion of negative controls rejected at FDR ≥ 25 %. Error bars denote the standard error of simulation results
Fig. 4
Fig. 4
Comparison of performance of SNP-level meta-analysis and pathway-level meta-analysis using simulated gene-sets. Results are produced in the same workflow as stated in Table 1. a Sensitivity. b Specificity. c Positive likelihood ratio (Sensitivity/(1-Specificity)). d Receiver operating characteristic curve. Error bars denote the standard error of simulation results
Fig. 5
Fig. 5
Performance comparison between wKDA and the unweighted key driver analysis. Two empirical subnetworks (Lipid I & II) were obtained from a previous publication [23], and a canonical metabolism of lipids and lipoproteins pathway was obtained from the Reactome database (R-HSA-556833). The methods were tested by projecting the three functional subnetworks onto two independent adipose networks (ac) and two independent liver regulatory networks (df). The adipose and liver networks were constructed from a collection of Bayesian tissue-specific network models (Additional file 1: Table S3). Overlap between the tissue-specific key driver signals across two independent regulatory networks was defined according to the Jaccard index. Overlap ratio was calculated for both original networks and networks with 25, 50, 75 or 100 % rewiring of edges
Fig. 6
Fig. 6
Visualization of adipose (a) and liver (b) networks around top key drivers that were identified for cholesterol-associated subnetworks. Top key drivers (nodes with the largest size) are selected as the top five independent key regulatory genes (genes whose neighbourhood has less than 25 % overlap with the neighbourhood of other independent hubs) for subnetwork 2 and subnetwork 6. Subnetwork member genes are denoted as medium size nodes and non-member genes as small size nodes. Top co-hubs (co-hubs with FDR < 10−10 in wKDA) are highlighted by yellow circles. Only edges that were supported by at least two studies are drawn

References

    1. Hunter DJ. Gene-environment interactions in human diseases. Nat Rev Genet. 2005;6(4):287–298. doi: 10.1038/nrg1578. - DOI - PubMed
    1. Mailman MD, Feolo M, Jin Y, Kimura M, Tryka K, Bagoutdinov R, Hao L, Kiang A, Paschall J, Phan L, et al. The NCBI dbGaP database of genotypes and phenotypes. Nat Genet. 2007;39(10):1181–1186. doi: 10.1038/ng1007-1181. - DOI - PMC - PubMed
    1. Barrett T, Edgar R. Gene expression omnibus: microarray data storage, submission, retrieval, and analysis. Methods Enzymol. 2006;411:352–369. doi: 10.1016/S0076-6879(06)11019-8. - DOI - PMC - PubMed
    1. Parkinson H, Kapushesky M, Shojatalab M, Abeygunawardena N, Coulson R, Farne A, Holloway E, Kolesnykov N, Lilja P, Lukk M, et al. ArrayExpress--a public database of microarray experiments and gene expression profiles. Nucleic Acids Res. 2007;35(Database issue):D747–D750. doi: 10.1093/nar/gkl995. - DOI - PMC - PubMed
    1. Consortium EP, Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007;447(7146):799–816. doi: 10.1038/nature05874. - DOI - PMC - PubMed

Publication types