Assessing transcription factor motif drift from noisy decoy sequences
- PMID: 16362907
Assessing transcription factor motif drift from noisy decoy sequences
Abstract
Genome scale identification of transcription factor binding sites (TFBS) is fundamental to understanding the complexities of mRNA expression at both the cell and organismal levels. While high-throughput experimental methods provide associations between transcription factors and the genes they regulate under a specified experimental condition, computational methods are still required to pinpoint the exact location of binding. Moreover, since the binding site is an intrinsic property of the promoter region, computational methods are in principle more general than condition dependent experimental methods. Computational identification of TFBSs is complicated in at least two different ways. First, transcription factors bind a heterogeneous distribution of sites and therefore have a distribution of affinities. Second, the set of sequences for which a common site is to be determined do not all have a site for the TF of interest. In this paper, we evaluate the robustness of TFBS identification with respect to both effects. We show addition of upstream regions that do not have the TFBS destroy the specificity of the predicted binding site. We also propose a method to calculate the distance between position weight matrices that can be used to measure "drift'' from the canonical binding site. The results presented here could be useful in developing future transcription factor binding site identification algorithms.
Similar articles
-
Integrating genomic data to predict transcription factor binding.Genome Inform. 2005;16(1):83-94. Genome Inform. 2005. PMID: 16362910
-
An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments.Nat Biotechnol. 2002 Aug;20(8):835-9. doi: 10.1038/nbt717. Epub 2002 Jul 8. Nat Biotechnol. 2002. PMID: 12101404
-
Regulatory motif finding by logic regression.Bioinformatics. 2004 Nov 1;20(16):2799-811. doi: 10.1093/bioinformatics/bth333. Epub 2004 May 27. Bioinformatics. 2004. PMID: 15166027
-
Eukaryotic transcription factor binding sites--modeling and integrative search methods.Bioinformatics. 2008 Jun 1;24(11):1325-31. doi: 10.1093/bioinformatics/btn198. Epub 2008 Apr 21. Bioinformatics. 2008. PMID: 18426806 Review.
-
Location analysis of DNA-bound proteins at the whole-genome level: untangling transcriptional regulatory networks.Bioessays. 2001 Jun;23(6):473-6. doi: 10.1002/bies.1066. Bioessays. 2001. PMID: 11385626 Review.
Cited by
-
A comparative bioinformatic analysis of C9orf72.PeerJ. 2018 Feb 19;6:e4391. doi: 10.7717/peerj.4391. eCollection 2018. PeerJ. 2018. PMID: 29479499 Free PMC article.
-
Positional clustering improves computational binding site detection and identifies novel cis-regulatory sites in mammalian GABAA receptor subunit genes.Nucleic Acids Res. 2007;35(3):e20. doi: 10.1093/nar/gkl1062. Epub 2007 Jan 3. Nucleic Acids Res. 2007. PMID: 17204484 Free PMC article.
-
Identification and characterization of renal cell carcinoma gene markers.Cancer Inform. 2007 Feb 9;3:65-92. Cancer Inform. 2007. PMID: 19455236 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Other Literature Sources
Molecular Biology Databases
Miscellaneous