Finding regulatory elements and regulatory motifs: a general probabilistic framework
- PMID: 17903285
- PMCID: PMC1995539
- DOI: 10.1186/1471-2105-8-S6-S4
Finding regulatory elements and regulatory motifs: a general probabilistic framework
Abstract
Over the last two decades a large number of algorithms has been developed for regulatory motif finding. Here we show how many of these algorithms, especially those that model binding specificities of regulatory factors with position specific weight matrices (WMs), naturally arise within a general Bayesian probabilistic framework. We discuss how WMs are constructed from sets of regulatory sites, how sites for a given WM can be discovered by scanning of large sequences, how to cluster WMs, and more generally how to cluster large sets of sites from different WMs into clusters. We discuss how 'regulatory modules', clusters of sites for subsets of WMs, can be found in large intergenic sequences, and we discuss different methods for ab initio motif finding, including expectation maximization (EM) algorithms, and motif sampling algorithms. Finally, we extensively discuss how module finding methods and ab initio motif finding methods can be extended to take phylogenetic relations between the input sequences into account, i.e. we show how motif finding and phylogenetic footprinting can be integrated in a rigorous probabilistic framework. The article is intended for readers with a solid background in applied mathematics, and preferably with some knowledge of general Bayesian probabilistic methods. The main purpose of the article is to elucidate that all these methods are not a disconnected set of individual algorithmic recipes, but that they are just different facets of a single integrated probabilistic theory.
Figures









Similar articles
-
Searching for statistically significant regulatory modules.Bioinformatics. 2003 Oct;19 Suppl 2:ii16-25. doi: 10.1093/bioinformatics/btg1054. Bioinformatics. 2003. PMID: 14534166
-
MotifCut: regulatory motifs finding with maximum density subgraphs.Bioinformatics. 2006 Jul 15;22(14):e150-7. doi: 10.1093/bioinformatics/btl243. Bioinformatics. 2006. PMID: 16873465
-
Statistical significance of cis-regulatory modules.BMC Bioinformatics. 2007 Jan 22;8:19. doi: 10.1186/1471-2105-8-19. BMC Bioinformatics. 2007. PMID: 17241466 Free PMC article.
-
Parsing regulatory DNA: general tasks, techniques, and the PhyloGibbs approach.J Biosci. 2007 Aug;32(5):863-70. doi: 10.1007/s12038-007-0086-0. J Biosci. 2007. PMID: 17914228 Review.
-
Computational approaches to finding and analyzing cis-regulatory elements.Methods Cell Biol. 2008;87:337-65. doi: 10.1016/S0091-679X(08)00218-5. Methods Cell Biol. 2008. PMID: 18485306 Review.
Cited by
-
Modeling the binding specificity of the RNA-binding protein GLD-1 suggests a function of coding region-located sites in translational repression.RNA. 2013 Oct;19(10):1317-26. doi: 10.1261/rna.037531.112. Epub 2013 Aug 23. RNA. 2013. PMID: 23974436 Free PMC article.
-
MicroRNA-221-222 regulate the cell cycle in mast cells.J Immunol. 2009 Jan 1;182(1):433-45. doi: 10.4049/jimmunol.182.1.433. J Immunol. 2009. PMID: 19109175 Free PMC article.
-
Inferring Contacting Residues within and between Proteins: What Do the Probabilities Mean?PLoS Comput Biol. 2016 May 12;12(5):e1004726. doi: 10.1371/journal.pcbi.1004726. eCollection 2016 May. PLoS Comput Biol. 2016. PMID: 27171220 Free PMC article. No abstract available.
-
Transcription factor binding site positioning in yeast: proximal promoter motifs characterize TATA-less promoters.PLoS One. 2011;6(9):e24279. doi: 10.1371/journal.pone.0024279. Epub 2011 Sep 9. PLoS One. 2011. PMID: 21931670 Free PMC article.
-
Inflammatory gene regulatory networks in amnion cells following cytokine stimulation: translational systems approach to modeling human parturition.PLoS One. 2011;6(6):e20560. doi: 10.1371/journal.pone.0020560. Epub 2011 Jun 2. PLoS One. 2011. PMID: 21655103 Free PMC article.
References
-
- Roulet E, Busso S, Camargo AA, Simpson AJ, Mermod N, Bucher P. High-throughput SELEX-SAGE method for quantitative modeling of transcription-factor binding sites. Nat Biotechnol. 2002;20:831–835. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources