Statistical significance of clusters of motifs represented by position specific scoring matrices in nucleotide sequences
- PMID: 12136103
- PMCID: PMC135758
- DOI: 10.1093/nar/gkf438
Statistical significance of clusters of motifs represented by position specific scoring matrices in nucleotide sequences
Abstract
The human genome encodes the transcriptional control of its genes in clusters of cis-elements that constitute enhancers, silencers and promoter signals. The sequence motifs of individual cis- elements are usually too short and degenerate for confident detection. In most cases, the requirements for organization of cis-elements within these clusters are poorly understood. Therefore, we have developed a general method to detect local concentrations of cis-element motifs, using predetermined matrix representations of the cis-elements, and calculate the statistical significance of these motif clusters. The statistical significance calculation is highly accurate not only for idealized, pseudorandom DNA, but also for real human DNA. We use our method 'cluster of motifs E-value tool' (COMET) to make novel predictions concerning the regulation of genes by transcription factors associated with muscle. COMET performs comparably with two alternative state-of-the-art techniques, which are more complex and lack E-value calculations. Our statistical method enables us to clarify the major bottleneck in the hard problem of detecting cis-regulatory regions, which is that many known enhancers do not contain very significant clusters of the motif types that we search for. Thus, discovery of additional signals that belong to these regulatory regions will be the key to future progress.
Figures







Similar articles
-
Nucleotide variation of regulatory motifs may lead to distinct expression patterns.Bioinformatics. 2007 Jul 1;23(13):i440-9. doi: 10.1093/bioinformatics/btm183. Bioinformatics. 2007. PMID: 17646329
-
Bioinformatic identification of novel putative photoreceptor specific cis-elements.BMC Bioinformatics. 2007 Oct 22;8:407. doi: 10.1186/1471-2105-8-407. BMC Bioinformatics. 2007. PMID: 17953763 Free PMC article.
-
The limits of de novo DNA motif discovery.PLoS One. 2012;7(11):e47836. doi: 10.1371/journal.pone.0047836. Epub 2012 Nov 7. PLoS One. 2012. PMID: 23144830 Free PMC article.
-
The identification of cis-regulatory elements: A review from a machine learning perspective.Biosystems. 2015 Dec;138:6-17. doi: 10.1016/j.biosystems.2015.10.002. Epub 2015 Oct 21. Biosystems. 2015. PMID: 26499213 Review.
-
Regulatory regions in DNA: promoters, enhancers, silencers, and insulators.Methods Mol Biol. 2010;674:33-42. doi: 10.1007/978-1-60761-854-6_3. Methods Mol Biol. 2010. PMID: 20827584 Review.
Cited by
-
A motif co-occurrence approach for genome-wide prediction of transcription-factor-binding sites in Escherichia coli.Genome Res. 2004 Feb;14(2):201-8. doi: 10.1101/gr.1448004. Genome Res. 2004. PMID: 14762058 Free PMC article.
-
Alu and L1 retroelements are correlated with the tissue extent and peak rate of gene expression, respectively.J Korean Med Sci. 2004 Dec;19(6):783-92. doi: 10.3346/jkms.2004.19.6.783. J Korean Med Sci. 2004. PMID: 15608386 Free PMC article.
-
Simple Shared Motifs (SSM) in conserved region of promoters: a new approach to identify co-regulation patterns.BMC Bioinformatics. 2011 Sep 12;12:365. doi: 10.1186/1471-2105-12-365. BMC Bioinformatics. 2011. PMID: 21910886 Free PMC article.
-
Thousands of cis-regulatory sequence combinations are shared by Arabidopsis and poplar.Plant Physiol. 2012 Jan;158(1):145-55. doi: 10.1104/pp.111.186080. Epub 2011 Nov 4. Plant Physiol. 2012. PMID: 22058225 Free PMC article.
-
MotifCluster: an interactive online tool for clustering and visualizing sequences using shared motifs.Genome Biol. 2008;9(8):R128. doi: 10.1186/gb-2008-9-8-r128. Epub 2008 Aug 15. Genome Biol. 2008. PMID: 18706079 Free PMC article.
References
-
- Lander E.S., Linton,L.M., Birren,B., Nusbaum,C., Zody,M.C., Baldwin,J., Devon,K., Dewar,K., Doyle,M., FitzHugh,W. et al. (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860–921. - PubMed
-
- Claverie J.M., (2000) From bioinformatics to computational biology. Genome Res., 10, 1277–1279. - PubMed
-
- Arnone M.I., and Davidson,E.H. (1997) The hardwiring of development: organization and function of genomic regulatory systems. Development, 124, 1851–1864. - PubMed
-
- Deshler J.O., Highett,M.I. and Schnapp,B.J. (1997) Localization of Xenopus Vg1 mRNA by Vera protein and the endoplasmic reticulum. Science, 276, 1128–1131. - PubMed