Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Jul 18:8:258.
doi: 10.1186/1471-2105-8-258.

Accelerated search for biomolecular network models to interpret high-throughput experimental data

Affiliations

Accelerated search for biomolecular network models to interpret high-throughput experimental data

Suman Datta et al. BMC Bioinformatics. .

Abstract

Background: The functions of human cells are carried out by biomolecular networks, which include proteins, genes, and regulatory sites within DNA that encode and control protein expression. Models of biomolecular network structure and dynamics can be inferred from high-throughput measurements of gene and protein expression. We build on our previously developed fuzzy logic method for bridging quantitative and qualitative biological data to address the challenges of noisy, low resolution high-throughput measurements, i.e., from gene expression microarrays. We employ an evolutionary search algorithm to accelerate the search for hypothetical fuzzy biomolecular network models consistent with a biological data set. We also develop a method to estimate the probability of a potential network model fitting a set of data by chance. The resulting metric provides an estimate of both model quality and dataset quality, identifying data that are too noisy to identify meaningful correlations between the measured variables.

Results: Optimal parameters for the evolutionary search were identified based on artificial data, and the algorithm showed scalable and consistent performance for as many as 150 variables. The method was tested on previously published human cell cycle gene expression microarray data sets. The evolutionary search method was found to converge to the results of exhaustive search. The randomized evolutionary search was able to converge on a set of similar best-fitting network models on different training data sets after 30 generations running 30 models per generation. Consistent results were found regardless of which of the published data sets were used to train or verify the quantitative predictions of the best-fitting models for cell cycle gene dynamics.

Conclusion: Our results demonstrate the capability of scalable evolutionary search for fuzzy network models to address the problem of inferring models based on complex, noisy biomolecular data sets. This approach yields multiple alternative models that are consistent with the data, yielding a constrained set of hypotheses that can be used to optimally design subsequent experiments.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic of fuzzification (triangular sets; above) and defuzzification (point-centroid; below), where x is the normalized ratiometric data point (i.e., the arctangent divided by π/2 of the base 2 logarithm of gene expression ratio) and μ is the fuzzy set membership function.
Figure 2
Figure 2
Sample contour plots for the errors (E, in the range as shown in the accompanying legend) where mutation probability is plotted against the crossover probability. The top plot is for pC (horizontal axis) and pM (vertical axis) from 0.1 to 1.0, and the bottom is for pC and pM greater than 0.6.
Figure 3
Figure 3
Sample contour plots for the errors (E, in the range as shown in the accompanying legend) where the population size (number of rule sets tested in each generation) is plotted against the number of generations (algorithm iterations). The top plot is for 10 genes and the bottom is for 100 genes.
Figure 4
Figure 4
Estimation of gamma function parameters for the error probability distribution of gene TOP2A, with the TN data set. (Bottom) The mean a and b parameters (solid line with white squares and grey line with black diamonds, respectively) estimated for increasing sample sizes uniformly drawn from the space of all possible rule sets. (Top) Coefficients of variation (standard deviation divided by mean) versus sample size (based on 10 samplings).
Figure 5
Figure 5
Time series data (left) and corresponding error distribution functions for specific genes: the solid line is data set TT3 for gene CCNB1 (a = 22.8, b = 0.0458, a/b = 498), the grey line is TT1 for CCNE1 (a = 9.3, b = 0.0120, a/b = 77), and the dashed line is Shake for CDKN3 (a = 197, b = 0.00517, a/b = 38 092).
Figure 6
Figure 6
Rules obtained from the TT3 data set for genes listed in Table 2. Outputs are in columns in the same sequence as the labeled inputs in rows. The majority of rules (zero, positive or negative based on the rule definitions in Table 1) are given (Z, P, and N respectively) along with the number of rule sets (out of 10) with that rule. In the case of a tie, 5 each, both are given in the appropriate cell.
Figure 7
Figure 7
Plots of experimental [18] and simulated data for the PCNA gene using the fuzzy rule set model found for the TT3 training data set. Shown here are (top) a typically reasonable fit, E = 0.249 and P = 3.47*10-7 on the training data set, and (bottom) a typically poorer fit (for a rule found using the evolutionary algorithm), E = 0.745 and P = 1.37*10-2 on the TN data set. The plots show the base 2 logarithm of the gene expression ratios (so the simulated results are transformed back from the normalized form) versus points in the experimental time series (arbitary units).

Similar articles

Cited by

References

    1. Csete ME, Doyle JC. Reverse engineering of biological complexity. Science. 2002;295:1664–1669. doi: 10.1126/science.1069981. - DOI - PubMed
    1. Gianchandani EP, Brautigan DL, Papin JA. System analyses characterize integrated functions of biochemical networks. Trends in Biochemical Sciences. 2006;31:284–291. doi: 10.1016/j.tibs.2006.03.007. - DOI - PubMed
    1. Arita M, Robert M, Tomita M. All systems go: launching cell simulation fueled by integrated experimental biology data. Current Opinion in Biotechnology. 2005;16:344–349. doi: 10.1016/j.copbio.2005.04.004. - DOI - PubMed
    1. Friedman N. Inferring cellular networks using probabilistic graphical models. Science. 2004;303:799–805. doi: 10.1126/science.1094068. - DOI - PubMed
    1. Liang S, Fuhrman S, Somogyi R. REVEAL, a general reverse engineering algorithm for inference of genetic network archictectures. Pacific Symposium on Biocomputing. 2000;3:18–29. - PubMed

LinkOut - more resources