Comparative analysis of methods for representing and searching for transcription factor binding sites
- PMID: 15297295
- DOI: 10.1093/bioinformatics/bth438
Comparative analysis of methods for representing and searching for transcription factor binding sites
Abstract
Motivation: An important step in unravelling the transcriptional regulatory network of an organism is to identify, for each transcription factor, all of its DNA binding sites. Several approaches are commonly used in searching for a transcription factor's binding sites, including consensus sequences and position-specific scoring matrices. In addition, methods that compute the average number of nucleotide matches between a putative site and all known sites can be employed. Such basic approaches can all be naturally extended by incorporating pairwise nucleotide dependencies and per-position information content. In this paper, we evaluate the effectiveness of these basic approaches and their extensions in finding binding sites for a transcription factor of interest without erroneously identifying other genomic sequences.
Results: In cross-validation testing on a dataset of Escherichia coli transcription factors and their binding sites, we show that there are statistically significant differences in how well various methods identify transcription factor binding sites. The use of per-position information content improves the performance of all basic approaches. Furthermore, including local pairwise nucleotide dependencies within binding site models results in statistically significant performance improvements for approaches based on nucleotide matches. Based on our analysis, the best results when searching for DNA binding sites of a particular transcription factor are obtained by methods that incorporate both information content and local pairwise correlations.
Availability: The software is available at http://compbio.cs.princeton.edu/bindsites.
Similar articles
-
Regulatory motif finding by logic regression.Bioinformatics. 2004 Nov 1;20(16):2799-811. doi: 10.1093/bioinformatics/bth333. Epub 2004 May 27. Bioinformatics. 2004. PMID: 15166027
-
Position dependencies in transcription factor binding sites.Bioinformatics. 2007 Apr 15;23(8):933-41. doi: 10.1093/bioinformatics/btm055. Epub 2007 Feb 18. Bioinformatics. 2007. PMID: 17308339
-
Similarity of position frequency matrices for transcription factor binding sites.Bioinformatics. 2005 Feb 1;21(3):307-13. doi: 10.1093/bioinformatics/bth480. Epub 2004 Aug 19. Bioinformatics. 2005. PMID: 15319260
-
Informative priors based on transcription factor structural class improve de novo motif discovery.Bioinformatics. 2006 Jul 15;22(14):e384-92. doi: 10.1093/bioinformatics/btl251. Bioinformatics. 2006. PMID: 16873497
-
Eukaryotic transcription factor binding sites--modeling and integrative search methods.Bioinformatics. 2008 Jun 1;24(11):1325-31. doi: 10.1093/bioinformatics/btn198. Epub 2008 Apr 21. Bioinformatics. 2008. PMID: 18426806 Review.
Cited by
-
FISim: a new similarity measure between transcription factor binding sites based on the fuzzy integral.BMC Bioinformatics. 2009 Jul 20;10:224. doi: 10.1186/1471-2105-10-224. BMC Bioinformatics. 2009. PMID: 19615102 Free PMC article.
-
A cost-aggregating integer linear program for motif finding.J Discrete Algorithms (Amst). 2011 Dec 1;9(4):326-334. doi: 10.1016/j.jda.2011.04.001. J Discrete Algorithms (Amst). 2011. PMID: 22081765 Free PMC article.
-
Performance evaluation of DNA motif discovery programs.Bioinformation. 2008;3(5):205-12. doi: 10.6026/97320630003205. Epub 2008 Dec 31. Bioinformation. 2008. Retraction in: Bioinformation. 2015 Nov 30;11(11):516. doi: 10.6026/97320630011516. PMID: 19255635 Free PMC article. Retracted.
-
Analysis of the SOS response of Vibrio and other bacteria with multiple chromosomes.BMC Genomics. 2012 Feb 3;13:58. doi: 10.1186/1471-2164-13-58. BMC Genomics. 2012. PMID: 22305460 Free PMC article.
-
Phyloscan: locating transcription-regulating binding sites in mixed aligned and unaligned sequence data.Nucleic Acids Res. 2010 Jul;38(Web Server issue):W268-74. doi: 10.1093/nar/gkq330. Epub 2010 Apr 30. Nucleic Acids Res. 2010. PMID: 20435683 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Molecular Biology Databases