Searching for statistically significant regulatory modules
- PMID: 14534166
- DOI: 10.1093/bioinformatics/btg1054
Searching for statistically significant regulatory modules
Abstract
Motivation: The regulatory machinery controlling gene expression is complex, frequently requiring multiple, simultaneous DNA-protein interactions. The rate at which a gene is transcribed may depend upon the presence or absence of a collection of transcription factors bound to the DNA near the gene. Locating transcription factor binding sites in genomic DNA is difficult because the individual sites are small and tend to occur frequently by chance. True binding sites may be identified by their tendency to occur in clusters, sometimes known as regulatory modules.
Results: We describe an algorithm for detecting occurrences of regulatory modules in genomic DNA. The algorithm, called mcast, takes as input a DNA database and a collection of binding site motifs that are known to operate in concert. mcast uses a motif-based hidden Markov model with several novel features. The model incorporates motif-specific p-values, thereby allowing scores from motifs of different widths and specificities to be compared directly. The p-value scoring also allows mcast to only accept motif occurrences with significance below a user-specified threshold, while still assigning better scores to motif occurrences with lower p-values. mcast can search long DNA sequences, modeling length distributions between motifs within a regulatory module, but ignoring length distributions between modules. The algorithm produces a list of predicted regulatory modules, ranked by E-value. We validate the algorithm using simulated data as well as real data sets from fruitfly and human.
Availability: http://meme.sdsc.edu/MCAST/paper
Similar articles
-
Statistical significance of cis-regulatory modules.BMC Bioinformatics. 2007 Jan 22;8:19. doi: 10.1186/1471-2105-8-19. BMC Bioinformatics. 2007. PMID: 17241466 Free PMC article.
-
Computational detection of cis -regulatory modules.Bioinformatics. 2003 Oct;19 Suppl 2:ii5-14. doi: 10.1093/bioinformatics/btg1052. Bioinformatics. 2003. PMID: 14534164
-
A graph-based approach to systematically reconstruct human transcriptional regulatory modules.Bioinformatics. 2007 Jul 1;23(13):i577-86. doi: 10.1093/bioinformatics/btm227. Bioinformatics. 2007. PMID: 17646346
-
Finding regulatory elements and regulatory motifs: a general probabilistic framework.BMC Bioinformatics. 2007 Sep 27;8 Suppl 6(Suppl 6):S4. doi: 10.1186/1471-2105-8-S6-S4. BMC Bioinformatics. 2007. PMID: 17903285 Free PMC article. Review.
-
Parsing regulatory DNA: general tasks, techniques, and the PhyloGibbs approach.J Biosci. 2007 Aug;32(5):863-70. doi: 10.1007/s12038-007-0086-0. J Biosci. 2007. PMID: 17914228 Review.
Cited by
-
Sequence and chromatin determinants of cell-type-specific transcription factor binding.Genome Res. 2012 Sep;22(9):1723-34. doi: 10.1101/gr.127712.111. Genome Res. 2012. PMID: 22955984 Free PMC article.
-
Visualization and exploration of conserved regulatory modules using ReXSpecies 2.BMC Evol Biol. 2011 Sep 24;11:267. doi: 10.1186/1471-2148-11-267. BMC Evol Biol. 2011. PMID: 21942985 Free PMC article.
-
The transcriptional landscape of mouse beta cells compared to human beta cells reveals notable species differences in long non-coding RNA and protein-coding gene expression.BMC Genomics. 2014 Jul 22;15(1):620. doi: 10.1186/1471-2164-15-620. BMC Genomics. 2014. PMID: 25051960 Free PMC article.
-
MSCAN: identification of functional clusters of transcription factor binding sites.Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W195-8. doi: 10.1093/nar/gkh387. Nucleic Acids Res. 2004. PMID: 15215379 Free PMC article.
-
Motif and conserved module analysis in DNA (promoters, enhancers) and RNA (lncRNA, mRNA) using AlModules.Sci Rep. 2022 Oct 20;12(1):17588. doi: 10.1038/s41598-022-21732-0. Sci Rep. 2022. PMID: 36266399 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources