Informative priors based on transcription factor structural class improve de novo motif discovery
- PMID: 16873497
- DOI: 10.1093/bioinformatics/btl251
Informative priors based on transcription factor structural class improve de novo motif discovery
Abstract
Motivation: An important problem in molecular biology is to identify the locations at which a transcription factor (TF) binds to DNA, given a set of DNA sequences believed to be bound by that TF. In previous work, we showed that information in the DNA sequence of a binding site is sufficient to predict the structural class of the TF that binds it. In particular, this suggests that we can predict which locations in any DNA sequence are more likely to be bound by certain classes of TFs than others. Here, we argue that traditional methods for de novo motif finding can be significantly improved by adopting an informative prior probability that a TF binding site occurs at each sequence location. To demonstrate the utility of such an approach, we present priority, a powerful new de novo motif finding algorithm.
Results: Using data from TRANSFAC, we train three classifiers to recognize binding sites of basic leucine zipper, forkhead, and basic helix loop helix TFs. These classifiers are used to equip priority with three class-specific priors, in addition to a default prior to handle TFs of other classes. We apply priority and a number of popular motif finding programs to sets of yeast intergenic regions that are reported by ChIP-chip to be bound by particular TFs. priority identifies motifs the other methods fail to identify, and correctly predicts the structural class of the TF recognizing the identified binding sites.
Availability: Supplementary material and code can be found at http://www.cs.duke.edu/~amink/.
Similar articles
-
Sequence features of DNA binding sites reveal structural class of associated transcription factor.Bioinformatics. 2006 Jan 15;22(2):157-63. doi: 10.1093/bioinformatics/bti731. Epub 2005 Nov 2. Bioinformatics. 2006. PMID: 16267080
-
Finding motifs from all sequences with and without binding sites.Bioinformatics. 2006 Sep 15;22(18):2217-23. doi: 10.1093/bioinformatics/btl371. Epub 2006 Jul 26. Bioinformatics. 2006. PMID: 16870937
-
An equilibrium partitioning model connecting gene expression and cis-motif content.Bioinformatics. 2006 Jul 15;22(14):e368-74. doi: 10.1093/bioinformatics/btl253. Bioinformatics. 2006. PMID: 16873495
-
Eukaryotic transcription factor binding sites--modeling and integrative search methods.Bioinformatics. 2008 Jun 1;24(11):1325-31. doi: 10.1093/bioinformatics/btn198. Epub 2008 Apr 21. Bioinformatics. 2008. PMID: 18426806 Review.
-
An extension and novel solution to the (l,d)-motif challenge problem.Genome Inform. 2004;15(2):63-71. Genome Inform. 2004. PMID: 15706492 Review.
Cited by
-
Apples and oranges: avoiding different priors in Bayesian DNA sequence analysis.BMC Bioinformatics. 2010 Mar 22;11:149. doi: 10.1186/1471-2105-11-149. BMC Bioinformatics. 2010. PMID: 20307305 Free PMC article.
-
Identification of direct target genes using joint sequence and expression likelihood with application to DAF-16.PLoS One. 2008 Mar 19;3(3):e1821. doi: 10.1371/journal.pone.0001821. PLoS One. 2008. PMID: 18350157 Free PMC article.
-
A data integration framework for prediction of transcription factor targets.Ann N Y Acad Sci. 2009 Mar;1158:205-14. doi: 10.1111/j.1749-6632.2008.03758.x. Ann N Y Acad Sci. 2009. PMID: 19348642 Free PMC article.
-
Genome-wide identification of calcium-response factor (CaRF) binding sites predicts a role in regulation of neuronal signaling pathways.PLoS One. 2010 May 27;5(5):e10870. doi: 10.1371/journal.pone.0010870. PLoS One. 2010. PMID: 20523734 Free PMC article.
-
A machine learning approach for identifying novel cell type-specific transcriptional regulators of myogenesis.PLoS Genet. 2012;8(3):e1002531. doi: 10.1371/journal.pgen.1002531. Epub 2012 Mar 8. PLoS Genet. 2012. PMID: 22412381 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Molecular Biology Databases
Miscellaneous