Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Sep 1;30(17):i445-52.
doi: 10.1093/bioinformatics/btu451.

Experimental design schemes for learning Boolean network models

Affiliations

Experimental design schemes for learning Boolean network models

Nir Atias et al. Bioinformatics. .

Abstract

Motivation: A holy grail of biological research is a working model of the cell. Current modeling frameworks, especially in the protein-protein interaction domain, are mostly topological in nature, calling for stronger and more expressive network models. One promising alternative is logic-based or Boolean network modeling, which was successfully applied to model signaling regulatory circuits in human. Learning such models requires observing the system under a sufficient number of different conditions. To date, the amount of measured data is the main bottleneck in learning informative Boolean models, underscoring the need for efficient experimental design strategies.

Results: We developed novel design approaches that greedily select an experiment to be performed so as to maximize the difference or the entropy in the results it induces with respect to current best-fit models. Unique to our maximum difference approach is the ability to account for all (possibly exponential number of) Boolean models displaying high fit to the available data. We applied both approaches to simulated and real data from the EFGR and IL1 signaling systems in human. We demonstrate the utility of the developed strategies in substantially improving on a random selection approach. Our design schemes highlight the redundancy in these datasets, leading up to 11-fold savings in the number of experiments to be performed.

Availability and implementation: Source code will be made available upon acceptance of the manuscript.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Overview of the experiment learning algorithm. We start from an ILP (Sharan and Karp, 2012) that learns a Boolean network model M whose readouts have OPT disagreements with the experimental data. In our formulation, this program is duplicated so that two models M1 and M2 are learned simultaneously. The models are further constrained so that: they both have at most OPT disagreements with the experimental data and are therefore optimal and that they both simulate an unknown experiment e^. The objective optimizes the difference between the readouts of the models (rM1,e^,rM2,e^, resp.) as per Equation (2)
Fig. 2.
Fig. 2.
Runtime comparison. A comparison of the running times of the maximum difference and maximum entropy approaches. Running times are given on two datasets (EGFR and IL1) when computing a single experiment to be conducted as a function of the number of available optimal models
Fig. 3.
Fig. 3.
Sensitivity to the number of available models. The estimation of entropy was dependent on the number of available models. In contrast, the maximum difference learning algorithm optimized over all candidate models. In both panels, the x-axis denotes the number of available models for entropy estimation, and the y-axis denotes the third quartile of the number of experiments required to obtain an optimal model (lower is better). Error bars denote standard error. (A) Simulation with EFGR signaling. (B) simulation with IL1 signaling
Fig. 4.
Fig. 4.
Sensitivity to the number of unknown functions. Increasing the number of unknown functions led to increment in the number of experiments that are required to uncover the underlying model in all but the maximum difference strategy, which retained an almost constant performance. In both panels, the x-axis denotes the number of unknown functions, and the y-axis denotes the third quartile of the number of experiments required to obtain an optimal model (lower is better). Error bars denote standard error. (A) Simulation with EGFR signaling. (B) Simulation with IL1 signaling
Fig. 5.
Fig. 5.
Performance evaluation on real data. The x-axis denotes the number of initial experiments, and the y-axis denotes the third quartile of the number of additional experiments required to reconstruct a model fitting the data as well as a model obtained from the all the available experimental data. (A) Results on the EGFR system. (B) Results on the IL1 system

References

    1. Apgar JF, et al. Stimulus design for model selection and validation in cell signaling. PLoS Comput. Biol. 2008;4:e30. - PMC - PubMed
    1. Balsa-Canto E, et al. Computational procedures for optimal experimental design in biological systems. IET Syst. Biol. 2008;2:163–172. - PubMed
    1. Bandara S, et al. Optimal experimental design for parameter estimation of a cell signaling model. PLoS Comput. Biol. 2009;5:e1000558. - PMC - PubMed
    1. Barrett CL, Palsson BO. Iterative reconstruction of transcriptional regulatory networks: an algorithmic approach. PLoS Comput. Biol. 2006;2:e52. - PMC - PubMed
    1. Gutenkunst RN, et al. Universally sloppy parameter sensitivities in systems biology models. PLoS Comput. Biol. 2007;3:1871–1878. - PMC - PubMed

Publication types