Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Aug 31:7:396.
doi: 10.1186/1471-2105-7-396.

Predicting transcription factor binding sites using local over-representation and comparative genomics

Affiliations

Predicting transcription factor binding sites using local over-representation and comparative genomics

Matthieu Defrance et al. BMC Bioinformatics. .

Abstract

Background: Identifying cis-regulatory elements is crucial to understanding gene expression, which highlights the importance of the computational detection of overrepresented transcription factor binding sites (TFBSs) in coexpressed or coregulated genes. However, this is a challenging problem, especially when considering higher eukaryotic organisms.

Results: We have developed a method, named TFM-Explorer, that searches for locally overrepresented TFBSs in a set of coregulated genes, which are modeled by profiles provided by a database of position weight matrices. The novelty of the method is that it takes advantage of spatial conservation in the sequence and supports multiple species. The efficiency of the underlying algorithm and its robustness to noise allow weak regulatory signals to be detected in large heterogeneous data sets.

Conclusion: TFM-Explorer provides an efficient way to predict TFBS overrepresentation in related sequences. Promising results were obtained in a variety of examples in human, mouse, and rat genomes. The software is publicly available at http://bioinfo.lifl.fr/TFM-Explorer.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Score profile and window extraction. Example of the score used to predict windows with a significant overrepresentation of TFBSs. Panel (a) shows the predicted TFBSs (black boxes) along the upstream sequences of five genes that come from two species. Panel (b) shows the evolution of the cumulative score computed for a given PWM with those sequences. Local overrepresentations detected by the algorithm are represented by boxes.
Figure 2
Figure 2
Influence of noise on the positive predictive value. Starting from the Rel/NF-κB and muscle data sets, an increasing number of actual sequences were replaced by random sequences. The noise level represents the proportion of sequences for the given set that have been randomly selected in the genome. The positive predictive value corresponds to the proportion of valid predictions (the most significant extracted TF is known to be involved in the regulation of the reference set).
Figure 3
Figure 3
Effect of P-value cutoff on the false positive error rate. Various set sizes (5, 50, and 100 sequences) were used to evaluate the rate of false positive. The suggested P-value cutoffs for a fixed false positive rate of 10% are 10-6 and 10-8 for 5 and 100 sequences, respectively.

References

    1. Wasserman W, Sandelin A. Applied bioinformatics for the identification of regulatory elements. Nature Reviews Genetics. 2004;5:276–287. doi: 10.1038/nrg1315. - DOI - PubMed
    1. Eddy SR. A Model of the Statistical Power of Comparative Genome Sequence Analysis. PLoS Biology. 2005;3 - PMC - PubMed
    1. Thijs G, Lescot M, Marchal K, Rombauts S, B BDM, Rouzé P, Moreau Y. A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics. 2001;17:1113–1122. doi: 10.1093/bioinformatics/17.12.1113. - DOI - PubMed
    1. Marsan L, Sagot MF. Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J Comput Biol. 2000;7:345–62. doi: 10.1089/106652700750050826. - DOI - PubMed
    1. van Helden J, Andre B, Collado-Vides J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol. 1998;281:827–42. doi: 10.1006/jmbi.1998.1947. - DOI - PubMed

Substances

LinkOut - more resources