Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep;16(9):843-852.
doi: 10.1038/s41592-019-0509-5. Epub 2019 Aug 30.

Assessment of network module identification across complex diseases

Collaborators, Affiliations

Assessment of network module identification across complex diseases

Sarvenaz Choobdar et al. Nat Methods. 2019 Sep.

Abstract

Many bioinformatics methods have been proposed for reducing the complexity of large gene or protein networks into relevant subnetworks or modules. Yet, how such methods compare to each other in terms of their ability to identify disease-relevant modules in different types of network remains poorly understood. We launched the 'Disease Module Identification DREAM Challenge', an open competition to comprehensively assess module identification methods across diverse protein-protein interaction, signaling, gene co-expression, homology and cancer-gene networks. Predicted network modules were tested for association with complex traits and diseases using a unique collection of 180 genome-wide association studies. Our robust assessment of 75 module identification methods reveals top-performing algorithms, which recover complementary trait-associated modules. We find that most of these modules correspond to core disease-relevant pathways, which often comprise therapeutic targets. This community challenge establishes biologically interpretable benchmarks, tools and guidelines for molecular network analysis to study human disease biology.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. The Disease Module Identification DREAM Challenge.
a, Network types included in the challenge. Throughout the paper, boxplot center lines show the median, box limits show upper and lower quartiles, whiskers show 1.5× interquartile range and points show outliers. b, Outline of the challenge. c, Outline of the scoring.
Fig. 2
Fig. 2. Assessment of module identification methods.
a, Main types of module identification approach used in the challenge. b, Final scores of the 42 module identification methods applied in Sub-challenge 1 for each of the six networks, as well as the overall score summarizing performance across networks (evaluated using the holdout GWAS set at 5% FDR; method IDs are defined in Supplementary Table 2). Ranks are indicated for the top ten methods. The last row shows the mean performance of 17 random modularizations of the networks (error bars show the standard deviation). c, Robustness of the overall ranking was evaluated by subsampling the GWAS set used for evaluation 1,000 times. For each method, the resulting distribution of ranks is shown as a boxplot. d, Number of trait-associated modules per network. Boxplots show the number of trait-associated modules across the 42 methods, normalized by the size of the respective network.
Fig. 3
Fig. 3. Complementarity of module predictions from different methods and networks.
a, Similarity of module predictions from different methods (color) and networks (shape). The closer two points are in the plot, the more similar the corresponding module predictions (multidimensional scaling, see Methods). The top two methods are highlighted for each network. b, Total number of predicted modules versus average module size for each method (same color scheme as in a). The top five methods (numbered) produced modular decompositions of varying granularity. c, Challenge score (number of trait-associated modules) versus modularity is shown for each of the 42 methods (same color scheme as in a). Modularity is a topological quality metric for modules based on the fraction of within-module edges. d, Final scores of multi-network module identification methods in Sub-challenge 2 (evaluated using the holdout GWAS set at 5% FDR). For comparison, the overall best-performing method from Sub-challenge 1 is also shown (method K1, purple). Teams used different combinations of the six challenge networks for their multi-network predictions (shown on the left). The difference between the top single-network module predictions and the top multi-network module predictions is not significant when subsampling the GWASs (Bayes factor < 3, Supplementary Fig. 5). The last row shows the mean performance of 17 random modularizations of the networks (error bars show standard deviation).
Fig. 4
Fig. 4. Overlap between modules associated with different traits and diseases.
a, Average number of trait-associated modules identified by challenge methods for each trait in Sub-challenge 1. For traits where multiple GWASs were available, results for the best-powered study are shown. HDL, high-density lipoprotein; LDL, low-density lipoprotein. b, Histograms showing the number of distinct traits per trait-associated module (brown) and gene (gray). c, Trait network showing similarity between GWAS traits based on overlap of associated modules (force-directed graph layout). Node size corresponds to the number of genes in trait-associated modules and edge width corresponds to the degree of overlap (Jaccard index, only edges for which the overlap is significant are shown (Bonferroni-corrected hypergeometric P < 0.05, see Methods)). Traits without any edges are not shown.
Fig. 5
Fig. 5. Support for trait-module genes in diverse datasets.
a, Example module from the consensus analysis in the STRING protein–protein interaction network (force-directed graph layout). The module is associated to height (n = 25 genes, FDR-corrected Pascal P = 0.005, see Methods). Color indicates Pascal GWAS gene scores (Methods). The module includes genes that are genome-wide significant (magenta and pink) as well as genes that do not reach the genome-wide significance threshold, but are predicted to be involved in height due to their module membership (blue and gray). b, Member genes of the height-associated module are supported by independent datasets: 24% of module genes are implicated in monogenic skeletal growth disorders (red squares, enrichment P = 7.5 × 10−4 (one-sided Fisher’s exact test)) and 28% of module genes have coding variants associated to height in an ExomeChip study published after the challenge (black diamonds, enrichment P = 1.9 × 10−6). The form of this module follows its function: two submodules comprise proteins involved in collagen fibril (yellow) and elastic fiber formation (green), while the proteins that link these submodules (orange) indeed have the biological function of crosslinking collagen fibril and elastic fibers.
Fig. 6
Fig. 6. Example trait modules comprising therapeutically relevant pathways.
ac, The modules are from the STRING protein–protein interaction networks and were generated using the consensus method. Node colors correspond to Pascal gene scores in the respective GWAS (Methods). For the two inflammatory disorders (a,b), red squares indicate genes causing monogenic immunodeficiency disorders (enrichment P values of 4.1 × 108 and 1.2 × 10−6, respectively (one-sided Fisher’s exact test)). a, Module associated with rheumatoid arthritis (n = 25 genes, FDR-corrected Pascal P = 0.04) that is involved in T cell activation. A costimulatory pathway is highlighted green, T cell response is regulated by activating (CD28) and inhibitory (CTLA4) surface receptors, which bind B7 family ligands (CD80 and CD86) expressed on the surface of activated antigen-presenting cells. The therapeutic agent CTLA4-Ig binds and blocks B7 ligands, thus inhibiting T cell response. b, Cytokine signaling module associated with inflammatory bowel disease (n = 42 genes, FDR-corrected Pascal P = 0.0006). The module includes the four known Janus kinases (JAK1-3 and TYK2, highlighted green), which are engaged by cytokine receptors to mediate activation of specific transcription factors (STATs). Inhibitors of JAK–STAT signaling are being tested in clinical trials for both ulcerative colitis and Crohn’s disease. c, Module associated with myocardial infarction (n = 36 genes, FDR-corrected Pascal P = 0.0001) comprising two main components of the NO/cGMP signaling pathway (endothelial nitric oxide synthases (NOS1-3) and soluble guanylate cyclases (GUCY1A2, GUCY1A3 and GUCY1B3), highlighted green), a key therapeutic target for cardiovascular disease.

References

    1. Schadt EE. Molecular networks as sensors and drivers of common human diseases. Nature. 2009;461:218–223. doi: 10.1038/nature08454. - DOI - PubMed
    1. Marbach D, et al. Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases. Nat. Methods. 2016;13:366–370. doi: 10.1038/nmeth.3799. - DOI - PMC - PubMed
    1. Bonder MJ, et al. Disease variants alter transcription factor levels and methylation of their binding sites. Nat. Genet. 2017;49:131–138. doi: 10.1038/ng.3721. - DOI - PubMed
    1. Califano A, Butte AJ, Friend S, Ideker T, Schadt E. Leveraging models of cell regulation and GWAS data in integrative network-based association studies. Nat. Genet. 2012;44:841–847. doi: 10.1038/ng.2355. - DOI - PMC - PubMed
    1. Hartwell LH, Hopfield JJ, Leibler S, Murray AW. From molecular to modular cell biology. Nature. 1999;402:C47–C52. doi: 10.1038/35011540. - DOI - PubMed

Publication types