Similarity of position frequency matrices for transcription factor binding sites
- PMID: 15319260
- DOI: 10.1093/bioinformatics/bth480
Similarity of position frequency matrices for transcription factor binding sites
Abstract
Motivation: Transcription-factor binding sites (TFBS) in promoter sequences of higher eukaryotes are commonly modeled using position frequency matrices (PFM). The ability to compare PFMs representing binding sites is especially important for de novo sequence motif discovery, where it is desirable to compare putative matrices to one another and to known matrices.
Results: We describe a PFM similarity quantification method based on product multinomial distributions, demonstrate its ability to identify PFM similarity and show that it has a better false positive to false negative ratio compared to existing methods. We grouped TFBS frequency matrices from two libraries into matrix families and identified the matrices that are common and unique to these libraries. We identified similarities and differences between the skeletal-muscle-specific and non-muscle-specific frequency matrices for the binding sites of Mef-2, Myf, Sp-1, SRF and TEF of Wasserman and Fickett. We further identified known frequency matrices and matrix families that were strongly similar to the matrices given by Wasserman and Fickett. We provide methodology and tools to compare and query libraries of frequency matrices for TFBSs.
Availability: Software is available to use over the Web at http://rulai.cshl.edu/MatCompare
Supplementary information: Database and clustering statistics, matrix families and representatives are available at http://rulai.cshl.edu/MatCompare/Supplementary.
Similar articles
-
Natural similarity measures between position frequency matrices with an application to clustering.Bioinformatics. 2008 Feb 1;24(3):350-7. doi: 10.1093/bioinformatics/btm610. Epub 2008 Jan 2. Bioinformatics. 2008. PMID: 18174183
-
DWE: discriminating word enumerator.Bioinformatics. 2005 Jan 1;21(1):31-8. doi: 10.1093/bioinformatics/bth471. Epub 2004 Aug 27. Bioinformatics. 2005. PMID: 15333453
-
Improvement of TRANSFAC matrices using multiple local alignment of transcription factor binding site sequences.Genome Inform. 2005;16(1):68-72. Genome Inform. 2005. PMID: 16362908
-
Prediction of Ras-effector interactions using position energy matrices.Bioinformatics. 2007 Sep 1;23(17):2226-30. doi: 10.1093/bioinformatics/btm336. Epub 2007 Jun 28. Bioinformatics. 2007. PMID: 17599936
-
Comparative analysis of methods for representing and searching for transcription factor binding sites.Bioinformatics. 2004 Dec 12;20(18):3516-25. doi: 10.1093/bioinformatics/bth438. Epub 2004 Aug 5. Bioinformatics. 2004. PMID: 15297295
Cited by
-
Conserved Motifs and Prediction of Regulatory Modules in Caenorhabditis elegans.G3 (Bethesda). 2012 Apr;2(4):469-81. doi: 10.1534/g3.111.001081. Epub 2012 Apr 1. G3 (Bethesda). 2012. PMID: 22540038 Free PMC article.
-
Predictive screening for regulators of conserved functional gene modules (gene batteries) in mammals.BMC Genomics. 2005 May 9;6:68. doi: 10.1186/1471-2164-6-68. BMC Genomics. 2005. PMID: 15882449 Free PMC article.
-
The impact of CpG island on defining transcriptional activation of the mouse L1 retrotransposable elements.PLoS One. 2010 Jun 29;5(6):e11353. doi: 10.1371/journal.pone.0011353. PLoS One. 2010. PMID: 20613872 Free PMC article.
-
FISim: a new similarity measure between transcription factor binding sites based on the fuzzy integral.BMC Bioinformatics. 2009 Jul 20;10:224. doi: 10.1186/1471-2105-10-224. BMC Bioinformatics. 2009. PMID: 19615102 Free PMC article.
-
Reliable prediction of regulator targets using 12 Drosophila genomes.Genome Res. 2007 Dec;17(12):1919-31. doi: 10.1101/gr.7090407. Epub 2007 Nov 7. Genome Res. 2007. PMID: 17989251 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Miscellaneous