. 2010 Feb 23;5(2):e9202.

doi: 10.1371/journal.pone.0009202.

Towards a rigorous assessment of systems biology models: the DREAM3 challenges

Robert J Prill¹, Daniel Marbach, Julio Saez-Rodriguez, Peter K Sorger, Leonidas G Alexopoulos, Xiaowei Xue, Neil D Clarke, Gregoire Altan-Bonnet, Gustavo Stolovitzky

Affiliations

PMID: 20186320
PMCID: PMC2826397
DOI: 10.1371/journal.pone.0009202

Towards a rigorous assessment of systems biology models: the DREAM3 challenges

Robert J Prill et al. PLoS One. 2010.

. 2010 Feb 23;5(2):e9202.

doi: 10.1371/journal.pone.0009202.

Authors

Robert J Prill¹, Daniel Marbach, Julio Saez-Rodriguez, Peter K Sorger, Leonidas G Alexopoulos, Xiaowei Xue, Neil D Clarke, Gregoire Altan-Bonnet, Gustavo Stolovitzky

Affiliation

¹ IBM T. J. Watson Research Center, Yorktown Heights, New York, United States of America.

PMID: 20186320
PMCID: PMC2826397
DOI: 10.1371/journal.pone.0009202

Erratum in

PLoS One. 2010;5(3). doi: 10.1371/annotation/f633213a-dc4f-4bee-b6c5-72d50e7073b8

Abstract

Background: Systems biology has embraced computational modeling in response to the quantitative nature and increasing scale of contemporary data sets. The onslaught of data is accelerating as molecular profiling technology evolves. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) is a community effort to catalyze discussion about the design, application, and assessment of systems biology models through annual reverse-engineering challenges.

Methodology and principal findings: We describe our assessments of the four challenges associated with the third DREAM conference which came to be known as the DREAM3 challenges: signaling cascade identification, signaling response prediction, gene expression prediction, and the DREAM3 in silico network challenge. The challenges, based on anonymized data sets, tested participants in network inference and prediction of measurements. Forty teams submitted 413 predicted networks and measurement test sets. Overall, a handful of best-performer teams were identified, while a majority of teams made predictions that were equivalent to random. Counterintuitively, combining the predictions of multiple teams (including the weaker teams) can in some cases improve predictive power beyond that of any single method.

Conclusions: DREAM provides valuable feedback to practitioners of systems biology modeling. Lessons learned from the predictions of the community provide much-needed context for interpreting claims of efficacy of algorithms described in the scientific literature.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. The objective of the signaling cascade identification challenge was to identify some of the molecular species in this diagram from single-cell flow cytometry measurements.**
The upstream binding of a ligand to a receptor and the downstream phosphorylation of a protein are illustrated.

Figure 2. The objective of the signaling response prediction challenge was to predict the concentrations of phosphoproteins and cytokines in response to combinatorial perturbations the environmental cues (stimuli) and perturbations of the signaling network (inhibtors).
(a) A compendium of phosphoprotein and cytokine measurements was provided as a training set. (b) Histograms (log scale) of the scoring metric (normalized squared error) for 100,000 random predictions were approximately Gaussian (fitted blue points). Significance of the predictions of the teams (black points) was assessed with respect to the empirical probability densities embodied by these histograms. Scores of the best-performer teams are denoted with arrows.

**Figure 3. Overlay of the assignment tables from the seven teams in the signaling cascade identification challenge.**
The number of teams making each assignment and the p-value is indicated. The p-value expresses the probability of a such a concentration of random guesses in the same table entry. Highlighted entries are correct. Five teams correctly identified species x1 as the kinase, a significant event for the community despite that no team had a significant individual performance.

**Figure 4. The objective of the gene expression prediction challenge was to predict temporal expression of 50 genes that were withheld from a training set consisting of 9285 genes.**
(a) Clustered heatmaps of the predicted genes (columns) reveal that two best-performer teams predicted substantially similar gene expression values, though different methods were employed. Results for the 60 minute time-point are shown. (b) The benefits of combining the predictions of multiple teams into a consensus prediction are illustrated by the rank sum prediction (triangles). Some rank sum predictions score higher than the best-performer, depending on the teams that are included. The highest score is achieved by a combination of the predictions of the best four teams.

Figure 5. The objective of the *in silico* network inference challenge was to infer networks of various sizes (10, 50, and 100 nodes) from steady-state and time-series “measurements” of simulated gene regulation networks.
Predicted networks were evaluated on the basis of two scoring metrics, (a) area under the ROC curve and (b) area under the precision-recall curve. ROC and precision-recall curves of the five best teams in the 100-node sub-challenge. (a) Dotted diagonal line is the expected value of a random prediction. (b) Note that the best and second-best performers have different precision-recall characteristics. (c) Histograms (log scale) of the AUROC scoring metric for 100,000 random predictions was approximately Gaussian (fitted blue points) whereas the histogram of the AUPR metric was not (inset). Significance of the predictions of the teams (black points) was assessed with respect to the empirical probability densities embodied by these histograms. Scores of the best-performer team are denoted with arrows. All plots are analyses of the gold standard network called *InSilico_Size100_Yeast2*.

**Figure 6. Analysis of the community of teams reveals characteristics of identifiable and unidentifiable network edges.**
The number of teams that identify an edge at a specified cutoff is a measure of how easy or difficult an edge is to identify. In this analysis we use a cutoff of 2P (i.e., twice the number of actual positive edges in the gold standard network). (a) Histograms indicate the number of teams that correctly identified the edges of the gold standard network called *InSilico_Size100_Ecoli1*. The ten worst teams in the 100-node sub-challenge identified about the same number of edges as is expected by chance. By contrast, the ten best teams identified more edges than is expected by chance and this sub-community has a markedly different *identifiability distribution* than random. Still, some edges were not identified by even the ten best teams (see bin corresponding to zero teams). Unidentified edges are characterized by (b) a property of the measurement data and (c) a topological property of the network. (b) Unidentified edges have a lower null-mutant absolute z-score than those that were identified by at least one of the ten best teams. This metric is a measure of the information content of the measurements. (c) Unidentified edges belong to target nodes with a higher in-degree than edges that were identified by at least one of the ten best teams. Circles denote the median and bars denote upper and lower quartiles. Statistics were not computed for bins containing less than four edges. (d) The benefits of combining the predictions of multiple teams into a consensus prediction are illustrated by the rank sum prediction (triangles). Though no rank sum prediction scored higher than the best-performer, a consensus of the predictions of the second and third place teams boosted the score of the second place team. Rank sum analysis shown for the 100-node sub-challenge.

**Figure 7. Community analysis of systematic false positives.**
Systematic false positive (FP) edges are the top one percent of edges that were predicted by the most teams to exist, yet are actually absent from the gold standard (i.e., negative). Rare false positive edges are the remaining 99 percent of edges that are absent from the gold standard network. The entries of each two-by-two contingency table sum to the total number of negative edges (i.e., those not present) in the gold standard network. There is a relative concentration of FP errors in the shortcut and co-regulated topologies, as evidenced by the A-to-B ratio. P-values for each contingency table were computed by Fisher's exact test, which expresses the probability that a random partitioning of the data will result in such a contingency table.

**Figure 8. Survey of *in silico* network methods.**
There does not seem to be a correlation between methods and scores, implying that success is more related to the details of implementation than the choice of general methodology.

See this image and copyright information in PMC

Cited by

An integrated approach to reconstructing genome-scale transcriptional regulatory networks.
Imam S, Noguera DR, Donohue TJ. Imam S, et al. PLoS Comput Biol. 2015 Feb 27;11(2):e1004103. doi: 10.1371/journal.pcbi.1004103. eCollection 2015 Feb. PLoS Comput Biol. 2015. PMID: 25723545 Free PMC article.
Summary of the DREAM8 Parameter Estimation Challenge: Toward Parameter Identification for Whole-Cell Models.
Karr JR, Williams AH, Zucker JD, Raue A, Steiert B, Timmer J, Kreutz C; DREAM8 Parameter Estimation Challenge Consortium; Wilkinson S, Allgood BA, Bot BM, Hoff BR, Kellen MR, Covert MW, Stolovitzky GA, Meyer P. Karr JR, et al. PLoS Comput Biol. 2015 May 28;11(5):e1004096. doi: 10.1371/journal.pcbi.1004096. eCollection 2015 May. PLoS Comput Biol. 2015. PMID: 26020786 Free PMC article.
A regulatory network modeled from wild-type gene expression data guides functional predictions in Caenorhabditis elegans development.
Stigler B, Chamberlin HM. Stigler B, et al. BMC Syst Biol. 2012 Jun 26;6:77. doi: 10.1186/1752-0509-6-77. BMC Syst Biol. 2012. PMID: 22734688 Free PMC article.
A New Drug Combinatory Effect Prediction Algorithm on the Cancer Cell Based on Gene Expression and Dose-Response Curve.
Goswami CP, Cheng L, Alexander PS, Singal A, Li L. Goswami CP, et al. CPT Pharmacometrics Syst Pharmacol. 2015 Feb;4(2):e9. doi: 10.1002/psp4.9. Epub 2015 Feb 19. CPT Pharmacometrics Syst Pharmacol. 2015. PMID: 26225234 Free PMC article.
Network-based approaches for understanding gene regulation and function in plants.
Ko DK, Brandizzi F. Ko DK, et al. Plant J. 2020 Oct;104(2):302-317. doi: 10.1111/tpj.14940. Epub 2020 Aug 28. Plant J. 2020. PMID: 32717108 Free PMC article. Review.

See all "Cited by" articles

References

1. Kim HD, Shay T, O'Shea EK, Regev A. Transcriptional regulatory circuits: predicting numbers from alphabets. Science. 2009;325:429–432. - PMC - PubMed
1. Kremling A, Fischer S, Gadkar K, Doyle FJ, Sauter T, et al. A benchmark for methods in reverse engineering and model discrimination: problem formulation and solutions. Genome Res. 2004;14:1773–1785. - PMC - PubMed
1. David LA, Wiggins CH. Benchmarking of dynamic bayesian networks inferred from stochastic time-series data. Ann N Y Acad Sci. 2007;1115:90–101. - PubMed
1. Camacho D, Vera Licona P, Mendes P, Laubenbacher R. Comparison of reverse-engineering methods using an in silico network. Ann N Y Acad Sci. 2007;1115:73–89. - PubMed
1. Cantone I, Marucci L, Iorio F, Ricci MA, Belcastro V, et al. A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches. Cell. 2009;137:172–181. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 AI083408/AI/NIAID NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Towards a rigorous assessment of systems biology models: the DREAM3 challenges

Affiliation

Towards a rigorous assessment of systems biology models: the DREAM3 challenges

Authors

Affiliation

Erratum in

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Erratum in

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources