On counting position weight matrix matches in a sequence, with application to discriminative motif finding
- PMID: 16873507
- DOI: 10.1093/bioinformatics/btl227
On counting position weight matrix matches in a sequence, with application to discriminative motif finding
Abstract
Motivation and results: The position weight matrix (PWM) is a popular method to model transcription factor binding sites. A fundamental problem in cis-regulatory analysis is to "count" the occurrences of a PWM in a DNA sequence. We propose a novel probabilistic score to solve this problem of counting PWM occurrences. The proposed score has two important properties: (1) It gives appropriate weights to both strong and weak occurrences of the PWM, without using thresholds. (2) For any given PWM, this score can be computed while allowing for occurrences of other, a priori known PWMs, in a statistically sound framework. Additionally, the score is efficiently differentiable with respect to the PWM parameters, which has important consequences for designing search algorithms. The second problem we address is to find, ab initio, PWMs that have high counts in one set of sequences, and low counts in another. We develop a novel algorithm to solve this "discriminative motif-finding problem", using the proposed score for counting a PWM in the sequences. The algorithm is a local search technique that exploits derivative information on an objective function to enhance speed and performance. It is extensively tested on synthetic data, and shown to perform better than other discriminative as well as non-discriminative PWM finding algorithms. It is then applied to cis-regulatory modules involved in development of the fruitfly embryo, to elicit known and novel motifs. We finally use the algorithm on genes predictive of social behavior in the honey bee, and find interesting motifs.
Availability: The program is available upon request from the author.
Similar articles
-
MotifCut: regulatory motifs finding with maximum density subgraphs.Bioinformatics. 2006 Jul 15;22(14):e150-7. doi: 10.1093/bioinformatics/btl243. Bioinformatics. 2006. PMID: 16873465
-
MUSA: a parameter free algorithm for the identification of biologically significant motifs.Bioinformatics. 2006 Dec 15;22(24):2996-3002. doi: 10.1093/bioinformatics/btl537. Epub 2006 Oct 26. Bioinformatics. 2006. PMID: 17068086
-
Apples to apples: improving the performance of motif finders and their significance analysis in the Twilight Zone.Bioinformatics. 2006 Jul 15;22(14):e393-401. doi: 10.1093/bioinformatics/btl245. Bioinformatics. 2006. PMID: 16873498
-
Finding regulatory elements and regulatory motifs: a general probabilistic framework.BMC Bioinformatics. 2007 Sep 27;8 Suppl 6(Suppl 6):S4. doi: 10.1186/1471-2105-8-S6-S4. BMC Bioinformatics. 2007. PMID: 17903285 Free PMC article. Review.
-
A survey of DNA motif finding algorithms.BMC Bioinformatics. 2007 Nov 1;8 Suppl 7(Suppl 7):S21. doi: 10.1186/1471-2105-8-S7-S21. BMC Bioinformatics. 2007. PMID: 18047721 Free PMC article. Review.
Cited by
-
MochiView: versatile software for genome browsing and DNA motif analysis.BMC Biol. 2010 Apr 21;8:49. doi: 10.1186/1741-7007-8-49. BMC Biol. 2010. PMID: 20409324 Free PMC article.
-
Seeder: discriminative seeding DNA motif discovery.Bioinformatics. 2008 Oct 15;24(20):2303-7. doi: 10.1093/bioinformatics/btn444. Epub 2008 Aug 21. Bioinformatics. 2008. PMID: 18718942 Free PMC article.
-
GNNMF: a multi-view graph neural network for ATAC-seq motif finding.BMC Genomics. 2024 Mar 21;25(1):300. doi: 10.1186/s12864-024-10218-0. BMC Genomics. 2024. PMID: 38515040 Free PMC article.
-
A biophysical model for analysis of transcription factor interaction and binding site arrangement from genome-wide binding data.PLoS One. 2009 Dec 1;4(12):e8155. doi: 10.1371/journal.pone.0008155. PLoS One. 2009. PMID: 19956545 Free PMC article.
-
PhyloGibbs-MP: module prediction and discriminative motif-finding by Gibbs sampling.PLoS Comput Biol. 2008 Aug 29;4(8):e1000156. doi: 10.1371/journal.pcbi.1000156. PLoS Comput Biol. 2008. PMID: 18769735 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources