Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Oct;14(10A):1967-74.
doi: 10.1101/gr.2589004.

Decoding human regulatory circuits

Affiliations

Decoding human regulatory circuits

William Thompson et al. Genome Res. 2004 Oct.

Abstract

Clusters of transcription factor binding sites (TFBSs) which direct gene expression constitute cis-regulatory modules (CRMs). We present a novel algorithm, based on Gibbs sampling, which locates, de novo, the cis features of these CRMs, their component TFBSs, and the properties of their spatial distribution. The algorithm finds 69% of experimentally reported TFBSs and 85% of the CRMs in a reference data set of regions upstream of genes differentially expressed in skeletal muscle cells. A discriminant procedure based on the output of the model specifically discriminated regulatory sequences in muscle-specific genes in an independent test set. Application of the method to the analysis of 2710 10-kb fragments upstream of annotated human genes identified 17 novel candidate modules with a false discovery rate </=0.05, demonstrating the applicability of the method to genome-scale data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Sequence logos (Schneider and Stephens 1990) of the motif models predicted by the module sampler for the 24 pairs of human-mouse sequences in the positive training set. The logos for the reported sites were produced by aligning the reported human sites for each motif type.
Figure 2
Figure 2
Histogram of the Bayes ratios for 688 intergenic pairs from the human and mouse genomes, that had predicted modules, plotted on a log base 10 scale. The asterisk at log10(194.5)∼2.3 indicates the position of the Bayes ratio cutoff. Sequences above this point have a q-value ≤ 0.05. The line shows the robust fit to the Bayes ratio distribution.
Figure 3
Figure 3
(A) Histogram of the Bayes ratio, on a log10 scale, for the positive and negative validation sequence pairs in which a module was predicted. There are nine positive sequences with a predicted module, and 22 negative sequence pairs. (B) The distribution of Bayes ratios, on a log10 scale, for positive and negative training sequences from cross-validation which contained a predicted module. A rebuild of the models was required for only the 24 negative pairs with predicted modules from the original negative training set, as the other sequences contributed nothing to the model.
Figure 4
Figure 4
General parameters that the module sampler attempts to discover. A priori, motif binding models were modeled by uniform Dirichlet prior models. Nearest neighbor interactions were modeled as transition probabilities of a Markov chain. This allows us to calculate the posterior mean estimate of the transition probabilities based on the number of times that each specific type of binding site follows another and prior pseudocounts. A priori, we assume that all neighboring pairs also have uniform Dirichlet prior models. The algorithm also allows us to draw inferences regarding the number of sites per sequence. We chose a prior distribution on the number of sites per sequence based upon the distribution of reported sites. The separation distance was modeled as a flat function truncated at 100 bp.

References

    1. Aerts, S., Van Loo, P., Thijs, G., Moreau, Y., and De Moor, B. 2003. Computational detection of cis-regulatory modules. Bioinformatics 19: 5ii-14ii. - PubMed
    1. Aicher, W.K., Sakamoto, K.M., Hack, A., and Eibel, H. 1999. Analysis of functional elements in the human Egr-1 gene promoter. Rheumatol. Int. 18: 207-214. - PubMed
    1. Bischoff, C., Kahns, S., Lund, A., Jorgensen, H.F., Praestegaard, M., Clark, B.F.C., and Leffers, H. 2000. The Human Elongation Factor 1 A-2 Gene (EEF1A2): Complete sequence and characterization of gene structure and promoter activity*1. Genomics 68: 63-70. - PubMed
    1. Boffelli, D., McAuliffe, J., Ovcharenko, D., Lewis, K.D., Ovcharenko, I., Pachter, L., and Rubin, E.M. 2003. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299: 1391-1394. - PubMed
    1. Brenner, V. 1998. Von der Sequenz zur Funktion: Genomanalyse einer 102 KB-Region des humanen X-Chromosoms. Friedrich-Schiller University, Jena, Germany.

WEB SITE REFERENCES

    1. http://zlab.bu.edu/~mfrith/comet/; COMET.
    1. http://jaspar.cgb.ki.se; JASPAR database.
    1. http://bayesweb.wadsworth.org/gibbs/module; Module sampler results and data.
    1. http://www.repeatmasker.org/; RepeatMasker.
    1. http://www.gene-regulation.com; TRANSFAC.

Publication types

Substances

LinkOut - more resources