Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Dec;29(12):2034-2045.
doi: 10.1101/gr.251983.119. Epub 2019 Nov 21.

Identifying gene function and module connections by the integration of multispecies expression compendia

Affiliations

Identifying gene function and module connections by the integration of multispecies expression compendia

Hao Li et al. Genome Res. 2019 Dec.

Abstract

The functions of many eukaryotic genes are still poorly understood. Here, we developed and validated a new method, termed GeneBridge, which is based on two linked approaches to impute gene function and bridge genes with biological processes. First, Gene-Module Association Determination (G-MAD) allows the annotation of gene function. Second, Module-Module Association Determination (M-MAD) allows predicting connectivity among modules. We applied the GeneBridge tools to large-scale multispecies expression compendia-1700 data sets with over 300,000 samples from human, mouse, rat, fly, worm, and yeast-collected in this study. G-MAD identifies novel functions of genes-for example, DDT in mitochondrial respiration and WDFY4 in T cell activation-and also suggests novel components for modules, such as for cholesterol biosynthesis. By applying G-MAD on data sets from respective tissues, tissue-specific functions of genes were identified-for instance, the roles of EHHADH in liver and kidney, as well as SLC6A1 in brain and liver. Using M-MAD, we identified a list of module-module associations, such as those between mitochondria and proteasome, mitochondria and histone demethylation, as well as ribosomes and lipid biosynthesis. The GeneBridge tools together with the expression compendia are available as an open resource, which will facilitate the identification of connections linking genes, modules, phenotypes, and diseases.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Gene-Module Association Determination (G-MAD). (A) G-MAD methodology. See text and Methods for detailed description. (B) Module similarity network showing the composition similarities across all module pairs. Modules were detected using a community detection algorithm embedded in Gephi and indicated in different colors. The 10 most frequent words of the module terms in each module were used to represent the module and can be found in Supplemental Table S2. (C) Influence of the GMAS threshold (t) on the true positive rate (TPR) and false positive rate (FPR) of G-MAD. Using a threshold of 0.268, G-MAD identified 10% of true positives and 0.24% of false positives (reflected by the red lines intersecting the x- and y-axes). (D) G-MAD revealed the potential role of WDFY4 in T cell activation and immune response. The threshold of significant gene-module association is indicated by the red dashed line. Modules are organized by the module similarities. Known modules connected to WDFY4 from annotations are shown in red dots (there is no known connected module for WDFY4) and other modules with GMAS over the threshold are shown in black dots. Dot sizes reflect the GMAS of WDFY4 against the respective modules. Detailed information of all the modules is available at www.systems-genetics.org/modules_by_gene/WDFY4?organism=human. (E) G-MAD identified the involvement of known as well as 20 novel genes in cholesterol biosynthesis. The threshold of significant gene-module association is indicated by the red dashed line. Genes are organized by the genetic positions across chromosomes. Genes annotated to be involved in cholesterol biosynthesis are shown in red dots and novel genes with GMAS over the threshold are shown in black dots. Novel genes conserved in human, mouse, and rat are highlighted in red bold text.
Figure 2.
Figure 2.
Predicting tissue specificity of modules. (A) Heat map showing the correlation coefficient averages of genes (ρ¯) in modules from expression data of a subset of human data sets. Data sets from different tissues are arranged and colored (top bar). Modules are clustered in rows using hierarchical clustering. ρ¯ values for each module are centered and scaled. (B) Coexpressions among genes of pancreatic secretion module across tissues in human. The average correlation coefficients across the genes in the pancreatic secretion module in human data sets are used to illustrate the coexpressions of this module across tissues. Genes in the pancreatic secretion module have higher coexpression in data sets from the pancreas compared to those from other tissues. (C) Heat map showing the tissue specificity of modules inferred from the correlation coefficient of respective tissues against the other tissues. Modules are clustered in rows using hierarchical clustering. The −log10(P-values) obtained from the K–S test are centered and scaled for each module. (D) The tissue-specificity of pancreatic secretion in pancreas (left) and blood (right) is illustrated by the empirical cumulative distribution function (ECDF). The red dotted lines indicate the K–S statistic, which is based on the maximum distance between the two curves. Curves shifting toward the right indicate that data sets from the respective tissue have a higher correlation coefficient and, therefore, greater specificity for this tissue. In this case, the steeply rising part of the ECDF, also shown as the peak of the density of the correlations in B, is shifted toward higher correlations.
Figure 3.
Figure 3.
G-MAD identifies tissue-specific associated modules for EHHADH by using data sets from different tissues. (A) Expression patterns of EHHADH across tissues. The figure was adapted from the Human Protein Atlas (www.proteinatlas.org/). (BD) G-MAD of EHHADH in human using data sets from all tissues (B), from liver (C), or from kidney (D). The threshold of significant gene-module association is indicated by the red dashed line. Modules are organized by their similarities. Known modules connected to EHHADH from gene annotations are shown in red dots and other modules with GMAS over the threshold are shown by black dots. (E) Comparison of G-MAD results of EHHADH in liver and kidney. Known modules connected to EHHADH are shown in red dots. The threshold of significant gene-module association is indicated by the red dashed line. Modules significantly associated with EHHADH only in one specific tissue are highlighted. The comparison of the association results of EHHADH in liver and kidney can be found in Supplemental Table S3.
Figure 4.
Figure 4.
G-MAD predicts novel genes linked to mitochondria. (A) G-MAD Manhattan plot of the respiratory electron transport (Reactome: R-HSA-611105) module in human. Genes are arranged based on their genetic positions and genes annotated to be involved in the module are colored red. Genes with absolute GMAS over 0.268 are considered significantly associated. DDT, BOLA3, and ARID1A are labeled. (B) Venn diagram of novel genes associated with respiratory electron transport module in human, mouse, and rat; 707 genes were predicted to be mito-proteins by G-MAD in all three species, and 351 genes, including DMAC1, NDUFAF8, FMC1, and BOLA3, were recently annotated to be involved in mitochondrial respiration in at least one species, whereas 356 genes, including DDT, C16orf91, C15orf61, FLAD1, and GRHPR, have not been previously linked with mitochondria based on the current annotations. The association results for all genes in human, mouse, and rat can be found in Supplemental Table S5. (C) DDT associates with mitochondrial respiratory chain modules in human. The threshold of significant gene-module association is indicated by the red dashed line. Modules are organized by module similarities. Known modules connected to DDT from annotations are highlighted in red and other modules with GMAS over the threshold are colored in black. Dot sizes reflect the GMAS of DDT against the respective modules. (D) Module similarity network showing the modules associated with DDT. Modules are plotted based on their layout in Figure 1B and colored based on their GMAS against DDT. (E) Mitochondrial localization of DDT in mouse embryonic fibroblasts (MEFs). DDT expression is overlapped with the Mitotracker red label. (F) DDT knockdown leads to the reduction of oxygen consumption rate (OCR) as a reflection of mitochondrial respiration in human HEK293 cells. Addition of specific mitochondrial inhibitors, including the oligomycin (ATPase inhibitor), FCCP (uncoupling agent), and rotenone/antimycin A (electron transport chain inhibitors), are indicated by arrows. (G) ARID1A negatively associates with the mitochondrial respiratory chain in human. The threshold of significant gene-module association is indicated by the red dashed line. Modules are organized by the module similarities. Known modules connected to ARID1A from extant annotations are highlighted in red and other modules with GMAS over the threshold are colored in black. Dot sizes are proportional to GMAS of the respective modules. (H) Module similarity network showing the modules associated with ARID1A. Modules are colored based on their GMAS against ARID1A. (I) Mice with the uterine-specific Arid1a knockout showed positive enrichment in mitochondrial respiration modules. Nominal P-values from the GSEA results are used to plot against normalized enrichment score (NESs), with dot sizes indicating the number of genes in the modules and transparencies indicating the false discovery rate (FDR). (J) Enrichment plot showing the enrichment of genes included in respiratory electron transport in uterus-specific Arid1a knockout mice compared to wild-type controls. Genes are ranked based on the fold change between Arid1a knockout and wild-type mice, and the ranking positions of genes in respiratory electron transport are labeled as vertical black bars. (NES) Normalized enrichment score.
Figure 5.
Figure 5.
Module-Module Association Determination (M-MAD) reveals module connections. (A) Scheme of the M-MAD methodology in detecting module connections. Intermediate results of G-MAD for all modules are further processed and used as the basis of M-MAD. The −log10(P) values of G-MAD for the target module against all genes in each data set are used as the gene statistic for the module, and connections between the target module and all modules are calculated using CAMERA. The results are then meta-analyzed by taking the sample sizes and inter-gene correlations of all data sets to compute the module-module association score (MMAS) between modules. (B) Module association network showing the connections across all modules. Colors of nodes represent the modules defined in the global module similarity network in Figure 1B. Module clusters with respective colors are identified and labeled. Modules used as examples in the following figure panels are highlighted with a circle. (C) Comparison of pairwise module connections derived from module similarities in Figure 1B and associations (from M-MAD) in Figure 5B. A red dashed line is plotted when the pairwise module similarity equals association. The distributions of module similarity and association scores are illustrated in the top and at the right of the plot and are colored in red and blue, respectively. Two examples of novel module connections are encircled. (D,E) Subnetworks showing the association between mitochondrial and proteasomal modules (D), and mitochondrial and histone demethylation modules (E). Edge colors indicate the significance of module connections, with red as positive and blue as negative.
Figure 6.
Figure 6.
M-MAD reveals a negative association between the ribosome and lipid biosynthetic modules. (A) Subnetwork for the ribosome and lipid biosynthetic modules. The colors of the edges indicate the significance of module connections, with red as positive and blue as negative. (B) Lipid biosynthetic process negatively connected with ribosomal modules in human. The threshold of significant module-module connection is indicated by the red dashed line. Modules are organized by the module similarities. Dot sizes are proportional to MMASs of the respective modules. (C,D) Transcripts of genes encoding for ribosomal proteins in the liver negatively correlate with metabolic traits, such as body weight, fat mass, plasma glucose and cholesterol levels, in the BXD (C) and CTB6F2 (D) mouse cohorts. (*) P < 0.05, (**) P < 0.01, (***) P < 0.001. (E) Feeding adult C. elegans with RNAi clones of ribosomal proteins, including rps-10, rpl-14, and rpl-26, results in the accumulation of lipids, as reflected by Oil Red O staining. Experimental scheme and additional examples are shown in Supplemental Fig. S12. (***) P < 0.001. (ev) Empty vector. n = 3.

References

    1. Arroyo JD, Jourdain AA, Calvo SE, Ballarano CA, Doench JG, Root DE, Mootha VK. 2016. A genome-wide CRISPR death screen identifies genes essential for oxidative phosphorylation. Cell Metab 24: 875–885. 10.1016/j.cmet.2016.08.017 - DOI - PMC - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. 2000. Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25–29. 10.1038/75556 - DOI - PMC - PubMed
    1. Austin CP, Battey JF, Bradley A, Bucan M, Capecchi M, Collins FS, Dove WF, Duyk G, Dymecki S, Eppig JT, et al. 2004. The knockout mouse project. Nat Genet 36: 921–924. 10.1038/ng0904-921 - DOI - PMC - PubMed
    1. Barabási A-L, Gulbahce N, Loscalzo J. 2011. Network medicine: a network-based approach to human disease. Nat Rev Genet 12: 56–68. 10.1038/nrg2918 - DOI - PMC - PubMed
    1. Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, et al. 2013. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res 41: D991–D995. 10.1093/nar/gks1193 - DOI - PMC - PubMed

Publication types

LinkOut - more resources