Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2002 Oct 24:3:30.
doi: 10.1186/1471-2105-3-30. Epub 2002 Oct 24.

Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo

Affiliations
Comparative Study

Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo

Nikolaus Rajewsky et al. BMC Bioinformatics. .

Abstract

Background: Regulation of gene transcription is crucial for the function and development of all organisms. While gene prediction programs that identify protein coding sequence are used with remarkable success in the annotation of genomes, the development of computational methods to analyze noncoding regions and to delineate transcriptional control elements is still in its infancy.

Results: Here we present novel algorithms to detect cis-regulatory modules through genome wide scans for clusters of transcription factor binding sites using three levels of prior information. When binding sites for the factors are known, our statistical segmentation algorithm, Ahab, yields about 150 putative gap gene regulated modules, with no adjustable parameters other than a window size. If one or more related modules are known, but no binding sites, repeated motifs can be found by a customized Gibbs sampler and input to Ahab, to predict genes with similar regulation. Finally using only the genome, we developed a third algorithm, Argos, that counts and scores clusters of overrepresented motifs in a window of sequence. Argos recovers many of the known modules, upstream of the segmentation genes, with no training data.

Conclusions: We have demonstrated, in the case of body patterning in the Drosophila embryo, that our algorithms allow the genome-wide identification of regulatory modules. We believe that Ahab overcomes many problems of recent approaches and we estimated the false positive rate to be about 50%. Argos is the first successful attempt to predict regulatory modules using only the genome without training data. Complete results and module predictions across the Drosophila genome are available at http://uqbar.rockefeller.edu/~siggia/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Summary of the input (dark blue) and output (white) of the three algorithms (light blue).
Figure 2
Figure 2
Ahab score for the hairy locus Module score for the hairy locus (screenshot from our interactive web browser). Plotted is the Ahab score as a function of position in the genome. Known modules are marked as "module". Four of the known modules (stripes 1 and 5–7) have high enough scores to appear among the top 146 genome wide predictions and Ahabs predicted binding sites are mapped out in these cases. The stripe3+4 module is not recovered.
Figure 3
Figure 3
The even-skipped stripe 3+7 module Known binding sites (in blue) and sites predicted by Ahab (in red) for the even-skipped stripe 3+7 module. knirps sites are marked by circles, hunchback sites by boxes. The upper (lower) half depicts binding sites for the plus (minus) strand. The height of the red symbols corresponds to the score of the sites (Eq. 4).
Figure 4
Figure 4
Argos score for the upstream regions of giant, knirps and Kruppel Argos score to observe a 500 bp module upstream of giant, knirps and Kruppel. The bars mark known modules and translation start is at the right most base.

References

    1. Rubin GM, Yandell MD. Comparative Genomics of the Eukaryotes. Science. 2000;287:2204–15. doi: 10.1126/science.287.5461.2204. - DOI - PMC - PubMed
    1. Brivanlou AH, Darnell JE Jr. Signal transduction and the control of gene expression. Science. 2002;295:813–8. doi: 10.1126/science.1066355. - DOI - PubMed
    1. Davidson EH. Genomic regulatory systems. Academic Press, San Diego. 2001.
    1. Davidson EH, Rast JP. A genomic regulatory network for development. Science. 2002;295:1669–78. doi: 10.1126/science.1069883. - DOI - PubMed
    1. Arnone MI, Davidson EH. The hardwiring of development: organization and function of genomic regulatory systems. Development. 1997;124:1851–64. - PubMed

Publication types

LinkOut - more resources