. 2007 Jul 18:8:258.

doi: 10.1186/1471-2105-8-258.

Accelerated search for biomolecular network models to interpret high-throughput experimental data

Suman Datta¹, Bahrad A Sokhansanj

Affiliations

Affiliation

¹ School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA 19104, USA. sdatta@merrimackpharma.com <sdatta@merrimackpharma.com>

PMID: 17640351
PMCID: PMC1940030
DOI: 10.1186/1471-2105-8-258

Accelerated search for biomolecular network models to interpret high-throughput experimental data

Suman Datta et al. BMC Bioinformatics. 2007.

. 2007 Jul 18:8:258.

doi: 10.1186/1471-2105-8-258.

Authors

Suman Datta¹, Bahrad A Sokhansanj

Affiliation

¹ School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA 19104, USA. sdatta@merrimackpharma.com <sdatta@merrimackpharma.com>

PMID: 17640351
PMCID: PMC1940030
DOI: 10.1186/1471-2105-8-258

Abstract

Background: The functions of human cells are carried out by biomolecular networks, which include proteins, genes, and regulatory sites within DNA that encode and control protein expression. Models of biomolecular network structure and dynamics can be inferred from high-throughput measurements of gene and protein expression. We build on our previously developed fuzzy logic method for bridging quantitative and qualitative biological data to address the challenges of noisy, low resolution high-throughput measurements, i.e., from gene expression microarrays. We employ an evolutionary search algorithm to accelerate the search for hypothetical fuzzy biomolecular network models consistent with a biological data set. We also develop a method to estimate the probability of a potential network model fitting a set of data by chance. The resulting metric provides an estimate of both model quality and dataset quality, identifying data that are too noisy to identify meaningful correlations between the measured variables.

Results: Optimal parameters for the evolutionary search were identified based on artificial data, and the algorithm showed scalable and consistent performance for as many as 150 variables. The method was tested on previously published human cell cycle gene expression microarray data sets. The evolutionary search method was found to converge to the results of exhaustive search. The randomized evolutionary search was able to converge on a set of similar best-fitting network models on different training data sets after 30 generations running 30 models per generation. Consistent results were found regardless of which of the published data sets were used to train or verify the quantitative predictions of the best-fitting models for cell cycle gene dynamics.

Conclusion: Our results demonstrate the capability of scalable evolutionary search for fuzzy network models to address the problem of inferring models based on complex, noisy biomolecular data sets. This approach yields multiple alternative models that are consistent with the data, yielding a constrained set of hypotheses that can be used to optimally design subsequent experiments.

PubMed Disclaimer

Figures

**Figure 1**
Schematic of fuzzification (triangular sets; above) and defuzzification (point-centroid; below), where x is the normalized ratiometric data point (i.e., the arctangent divided by π/2 of the base 2 logarithm of gene expression ratio) and μ is the fuzzy set membership function.

**Figure 2**
Sample contour plots for the errors (E, in the range as shown in the accompanying legend) where mutation probability is plotted against the crossover probability. The top plot is for p_C(horizontal axis) and p_M(vertical axis) from 0.1 to 1.0, and the bottom is for p_Cand p_Mgreater than 0.6.

**Figure 3**
Sample contour plots for the errors (E, in the range as shown in the accompanying legend) where the population size (number of rule sets tested in each generation) is plotted against the number of generations (algorithm iterations). The top plot is for 10 genes and the bottom is for 100 genes.

**Figure 4**
Estimation of gamma function parameters for the error probability distribution of gene TOP2A, with the TN data set. (Bottom) The mean a and b parameters (solid line with white squares and grey line with black diamonds, respectively) estimated for increasing sample sizes uniformly drawn from the space of all possible rule sets. (Top) Coefficients of variation (standard deviation divided by mean) versus sample size (based on 10 samplings).

**Figure 5**
Time series data (left) and corresponding error distribution functions for specific genes: the solid line is data set TT3 for gene CCNB1 (a = 22.8, b = 0.0458, a/b = 498), the grey line is TT1 for CCNE1 (a = 9.3, b = 0.0120, a/b = 77), and the dashed line is Shake for CDKN3 (a = 197, b = 0.00517, a/b = 38 092).

**Figure 6**
Rules obtained from the TT3 data set for genes listed in Table 2. Outputs are in columns in the same sequence as the labeled inputs in rows. The majority of rules (zero, positive or negative based on the rule definitions in Table 1) are given (Z, P, and N respectively) along with the number of rule sets (out of 10) with that rule. In the case of a tie, 5 each, both are given in the appropriate cell.

**Figure 7**
Plots of experimental [18] and simulated data for the PCNA gene using the fuzzy rule set model found for the TT3 training data set. Shown here are (top) a typically reasonable fit, E = 0.249 and P = 3.47*10^-7on the training data set, and (bottom) a typically poorer fit (for a rule found using the evolutionary algorithm), E = 0.745 and P = 1.37*10^-2on the TN data set. The plots show the base 2 logarithm of the gene expression ratios (so the simulated results are transformed back from the normalized form) versus points in the experimental time series (arbitary units).

See this image and copyright information in PMC

Cited by

Borges dilemma, fundamental laws, and systems biology.
Ao P. Ao P. Bioinform Biol Insights. 2008 Apr 10;2:201-2. Bioinform Biol Insights. 2008. PMID: 19812776 Free PMC article. No abstract available.
An integrated framework to model cellular phenotype as a component of biochemical networks.
Gormley M, Akella VU, Quong JN, Quong AA. Gormley M, et al. Adv Bioinformatics. 2011;2011:608295. doi: 10.1155/2011/608295. Epub 2011 Nov 29. Adv Bioinformatics. 2011. PMID: 22190923 Free PMC article.
Estimation of Parameters Subject to Order Restrictions on a Circle With Application to Estimation of Phase Angles of Cell Cycle Genes.
Rueda C, Fernández MA, Peddada SD. Rueda C, et al. J Am Stat Assoc. 2009 Mar 1;104(485):338-347. doi: 10.1198/jasa.2009.0120. J Am Stat Assoc. 2009. PMID: 19750145 Free PMC article.
Identifying functional gene regulatory network phenotypes underlying single cell transcriptional variability.
Park J, Ogunnaike B, Schwaber J, Vadigepalli R. Park J, et al. Prog Biophys Mol Biol. 2015 Jan;117(1):87-98. doi: 10.1016/j.pbiomolbio.2014.11.004. Epub 2014 Nov 27. Prog Biophys Mol Biol. 2015. PMID: 25433230 Free PMC article.

References

1. Csete ME, Doyle JC. Reverse engineering of biological complexity. Science. 2002;295:1664–1669. doi: 10.1126/science.1069981. - DOI - PubMed
1. Gianchandani EP, Brautigan DL, Papin JA. System analyses characterize integrated functions of biochemical networks. Trends in Biochemical Sciences. 2006;31:284–291. doi: 10.1016/j.tibs.2006.03.007. - DOI - PubMed
1. Arita M, Robert M, Tomita M. All systems go: launching cell simulation fueled by integrated experimental biology data. Current Opinion in Biotechnology. 2005;16:344–349. doi: 10.1016/j.copbio.2005.04.004. - DOI - PubMed
1. Friedman N. Inferring cellular networks using probabilistic graphical models. Science. 2004;303:799–805. doi: 10.1126/science.1094068. - DOI - PubMed
1. Liang S, Fuhrman S, Somogyi R. REVEAL, a general reverse engineering algorithm for inference of genetic network archictectures. Pacific Symposium on Biocomputing. 2000;3:18–29. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Accelerated search for biomolecular network models to interpret high-throughput experimental data

Affiliation

Accelerated search for biomolecular network models to interpret high-throughput experimental data

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources