Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2004 Jan;14(1):170-8.
doi: 10.1101/gr.1642804. Epub 2003 Dec 12.

CONREAL: conserved regulatory elements anchored alignment algorithm for identification of transcription factor binding sites by phylogenetic footprinting

Affiliations
Comparative Study

CONREAL: conserved regulatory elements anchored alignment algorithm for identification of transcription factor binding sites by phylogenetic footprinting

Eugene Berezikov et al. Genome Res. 2004 Jan.

Abstract

Prediction of transcription-factor target sites in promoters remains difficult due to the short length and degeneracy of the target sequences. Although the use of orthologous sequences and phylogenetic footprinting approaches may help in the recognition of conserved and potentially functional sequences, correct alignment of the short transcription-factor binding sites can be problematic for established algorithms, especially when aligning more divergent species. Here, we report a novel phylogenetic footprinting approach, CONREAL, that uses biologically relevant information, that is, potential transcription-factor binding sites as represented by positional weight matrices, to establish anchors between orthologous sequences and to guide promoter sequence alignment. Comparison of the performance of CONREAL with the global alignment programs LAGAN and AVID using a reference data set, shows that CONREAL performs equally well for closely related species like rodents and human, and has a clear added value for aligning promoter elements of more divergent species like human and fish, as it identifies conserved transcription-factor binding sites that are not found by other methods. CONREAL is accessible via a Web interface at http://conreal.niob.knaw.nl/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Outline of the CONREAL algorithm.
Figure 2
Figure 2
Intersection of conserved reference sites found by AVID, LAGAN, and CONREAL approaches in (A) human–mouse–rat and (B) mammals–fish gene pairs. The analysis parameters are 75% PWM threshold, 50% homology threshold, and 5-bp flank length.
Figure 3
Figure 3
Intersection of the total number of conserved sites found by different approaches in intramammalian (top) and mammal–fish (bottom) gene pairs from the reference set. The percentage of sites found by all three methods is shown in black, sites confirmed by one additional method are dark gray, and method-specific fraction of sites is light gray. Error bars, 95% confidence intervals. The analysis parameters are 75% PWM threshold, 50% homology threshold, and 5-bp flank length.
Figure 4
Figure 4
Estimation of spurious prediction levels in intramammalian (top) and mammals–fish (bottom) pairwise comparisons. Dots represent total number of aligned hits found in an orthologous pair, whereas crosses represent the number of aligned hits found in the same pair when orthologous sequences are randomized. AVID results are shown in green, LAGAN in red, and CONREAL in blue. The data sets are sorted by the number of CONREAL predictions to improve perception of the graph. The analysis parameters are 75% PWM threshold, 50% homology threshold, and 5-bp flank length.
Figure 5
Figure 5
Output of the CONREAL webinterface. The example shows the results for the analysis of the mouse and dwarf gourami Foxa2 promoter regions (Accession nos. AB050942 and AB050940, respectively) performed by CONREAL (top) and LAGAN methods (bottom). The graphs show the positions of aligned hits and the distribution/concentration of conserved TFBSs along the sequences. The graphs are followed by sequence-alignment data and tables of conserved TFBSs linked to TransFac entries (data not shown). Black circles above the black bar (mouse sequence) and below the gray bar (gourami sequence) represent positions of the conserved regulatory elements CS1–CS3 that are experimentally confirmed to be functional in mouse and gourami sequences. The analysis parameters are 80% PWM threshold, 50% homology threshold, and 15-bp flank length.

References

    1. Aparicio, S., Morrison, A., Gould, A., Gilthorpe, J., Chaudhuri, C., Rigby, P., Krumlauf, R., and Brenner, S. 1995. Detecting conserved regulatory elements with the model genome of the Japanese puffer fish, Fugu rubripes. Proc. Natl. Acad. Sci. 92: 1684-1688. - PMC - PubMed
    1. Blanchette, M. and Tompa, M. 2002. Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res. 12: 739-748. - PMC - PubMed
    1. Boffelli, D., McAuliffe, J., Ovcharenko, D., Lewis, K.D., Ovcharenko, I., Pachter, L., and Rubin, E.M. 2003. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299: 1391-1394. - PubMed
    1. Bray, N., Dubchak, I., and Pachter, L. 2003. AVID: A global alignment program. Genome Res. 13: 97-102. - PMC - PubMed
    1. Brudno, M., Do, C.B., Cooper, G.M., Kim, M.F., Davydov, E., NISC Comparative Sequencing Program, Green, E.D., Sidow, A., and Batzoglou, S. 2003. LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13: 721-731. - PMC - PubMed

WEB SITE REFERENCES

    1. http://conreal.niob.knaw.nl/; CONREAL Web server.
    1. http://www.emboss.org; EMBOSS package.

Publication types

MeSH terms

LinkOut - more resources