Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jul 15;9(8):796-804.
doi: 10.1038/nmeth.2016.

Wisdom of crowds for robust gene network inference

Collaborators, Affiliations

Wisdom of crowds for robust gene network inference

Daniel Marbach et al. Nat Methods. .

Abstract

Reconstructing gene regulatory networks from high-throughput data is a long-standing challenge. Through the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we performed a comprehensive blind assessment of over 30 network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae and in silico microarray data. We characterize the performance, data requirements and inherent biases of different inference approaches, and we provide guidelines for algorithm application and development. We observed that no single inference method performs optimally across all data sets. In contrast, integration of predictions from multiple inference methods shows robust and high performance across diverse data sets. We thereby constructed high-confidence networks for E. coli and S. aureus, each comprising ~1,700 transcriptional interactions at a precision of ~50%. We experimentally tested 53 previously unobserved regulatory interactions in E. coli, of which 23 (43%) were supported. Our results establish community-based methods as a powerful and robust tool for the inference of transcriptional gene regulatory networks.

PubMed Disclaimer

Figures

Figure 1
Figure 1. The DREAM5 network inference challenge
Assessment involved the following steps (from left to right). (1) Participants were challenged to infer the genome-wide transcriptional regulatory networks of E. coli, S. cerevisiae, and S. aureus, as well as an in silico (simulated) network. (2) Gene expression datasets for a wide range of experimental conditions were compiled. Anonymized datasets were released to the community, hiding the identities of the genes. (3) 29 participating teams inferred gene regulatory networks. In addition, we applied 6 “off-the-shelf” inference methods. (4) Network predictions from individual teams were integrated to form community networks. (5) Network predictions were assessed using experimentally supported interactions from E. coli and S. cerevisae, as well as the known in silico network.
Figure 2
Figure 2. Evaluation of network inference methods
Inference methods are indexed according to Table 1. (a) The plots depict the performance for the individual networks (area under precision-recall curve, AUPR) and the overall score summarizing the performance across networks (Methods). R indicates performance of random predictions. C indicates performance of the integrated community predictions. (b) Methods are grouped according to the similarity of their predictions via principal component analysis. Shown are the 2nd vs. 3rd principal components; the 1st principal component accounts mainly for the overall performance (Supplementary Note 4). (c) The heatmap depicts method-specific biases in predicting network motifs. Rows represent individual methods and columns represent different types of regulatory motifs. Red and blue show interactions that are easier and harder to detect, respectively.
Figure 3
Figure 3. Analysis of community networks vs. individual inference methods
(a) The plot shows the overall score, which summarizes performance across the E. coli, S. cerevisiae, and in silico networks, for individual inference methods or various combinations of integrated methods. The first boxplot depicts the performance distribution of individual inference methods (K=1). Subsequent boxplots show the performance when integrating K>1 randomly sampled methods. The red bar shows the performance when integrating all methods (K=29). Boxplots depict performance distributions with respect to the minimum, the maximum and the three quartiles. (b) The probability that the community network ranks among the top x% of the K individual methods used to construct the community network. The diagonal shows the expected performance when choosing an individual method (K=1). (c) The integration of complementary methods is particularly beneficial. The first boxplot shows the performance of individual methods from clusters 13 (as defined in Fig. 2b). The second and third boxplots show performance of community networks obtained by integrating three randomly selected inference methods: (i) from the same cluster, or (ii) from different clusters. (d) The plots show the overall score for an initial community network formed by integrating all individual methods (open circles, blue) except for the best five and worst five. One-by-one the worst five (left panel) and best five (right panel) methods are added to form additional community networks (filled circles, red).
Figure 4
Figure 4. E. coli and S. aureus community networks
(a, b) At a cutoff of 1688 edges, the (a) E. coli community network connects 1,505 genes (including 204 transcription factors, shown as diamonds), and the (b) S. aureus network connects 1,084 genes (85 transcription factors). Network modules were identified and tested for Gene Ontology term enrichment, as indicated (grey colored genes do not show enrichment). A network module enriched for Gene Ontology terms related to pathogenesis is highlighted in the S. aureus network. (c) The schematics depict newly predicted E. coli regulatory interactions that were experimentally tested. The pie chart depicts the breakdown of strongly and weakly supported targets (Methods). The positive controls were six known interactions from RegulonDB.

References

    1. Surowiecki J. The Wisdom of Crowds : Why the Many are Smarter than the Few and How Collective Wisdom Shapes Business, Economies, Societies, and Nations. 2004.
    1. De Smet R, Marchal K. Advantages and limitations of current network inference methods. Nat Rev Microbiol. 2010;8:717–729. - PubMed
    1. Marbach D, et al. Revealing strengths and weaknesses of methods for gene network inference. Proc Natl Acad Sci USA. 2010;107:6286–6291. - PMC - PubMed
    1. Bar-Joseph Z, et al. Computational discovery of gene modules and regulatory networks. Nat Biotechnol. 2003;21:1337–1342. - PubMed
    1. Reiss DJ, Baliga NS, Bonneau R. Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinformatics. 2006;7:280. - PMC - PubMed

Publication types

MeSH terms