NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence
- PMID: 15760844
- PMCID: PMC1064142
- DOI: 10.1093/nar/gki282
NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence
Abstract
NestedMICA is a new, scalable, pattern-discovery system for finding transcription factor binding sites and similar motifs in biological sequences. Like several previous methods, NestedMICA tackles this problem by optimizing a probabilistic mixture model to fit a set of sequences. However, the use of a newly developed inference strategy called Nested Sampling means NestedMICA is able to find optimal solutions without the need for a problematic initialization or seeding step. We investigate the performance of NestedMICA in a range scenario, on synthetic data and a well-characterized set of muscle regulatory regions, and compare it with the popular MEME program. We show that the new method is significantly more sensitive than MEME: in one case, it successfully extracted a target motif from background sequence four times longer than could be handled by the existing program. It also performs robustly on synthetic sequences containing multiple significant motifs. When tested on a real set of regulatory sequences, NestedMICA produced motifs which were good predictors for all five abundant classes of annotated binding sites.
Figures








Similar articles
-
NestedMICA as an ab initio protein motif discovery tool.BMC Bioinformatics. 2008 Jan 14;9:19. doi: 10.1186/1471-2105-9-19. BMC Bioinformatics. 2008. PMID: 18194537 Free PMC article.
-
iMotifs: an integrated sequence motif visualization and analysis environment.Bioinformatics. 2010 Mar 15;26(6):843-4. doi: 10.1093/bioinformatics/btq026. Epub 2010 Jan 26. Bioinformatics. 2010. PMID: 20106815 Free PMC article.
-
Metamotifs--a generative model for building families of nucleotide position weight matrices.BMC Bioinformatics. 2010 Jun 25;11:348. doi: 10.1186/1471-2105-11-348. BMC Bioinformatics. 2010. PMID: 20579334 Free PMC article.
-
Parsing regulatory DNA: general tasks, techniques, and the PhyloGibbs approach.J Biosci. 2007 Aug;32(5):863-70. doi: 10.1007/s12038-007-0086-0. J Biosci. 2007. PMID: 17914228 Review.
-
Hidden Markov model and its applications in motif findings.Methods Mol Biol. 2010;620:405-16. doi: 10.1007/978-1-60761-580-4_13. Methods Mol Biol. 2010. PMID: 20652513 Review.
Cited by
-
MOST+: A de novo motif finding approach combining genomic sequence and heterogeneous genome-wide signatures.BMC Genomics. 2015;16 Suppl 7(Suppl 7):S13. doi: 10.1186/1471-2164-16-S7-S13. Epub 2015 Jun 11. BMC Genomics. 2015. PMID: 26099518 Free PMC article.
-
Specificity of Notch pathway activation: twist controls the transcriptional output in adult muscle progenitors.Development. 2010 Aug;137(16):2633-42. doi: 10.1242/dev.053181. Epub 2010 Jul 7. Development. 2010. PMID: 20610485 Free PMC article.
-
Discovery of regulatory elements is improved by a discriminatory approach.PLoS Comput Biol. 2009 Nov;5(11):e1000562. doi: 10.1371/journal.pcbi.1000562. Epub 2009 Nov 13. PLoS Comput Biol. 2009. PMID: 19911049 Free PMC article.
-
Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses.Mol Biol Evol. 2022 Jul 2;39(7):msac133. doi: 10.1093/molbev/msac133. Mol Biol Evol. 2022. PMID: 35700225 Free PMC article.
-
Genome-wide analysis of the binding of the Hox protein Ultrabithorax and the Hox cofactor Homothorax in Drosophila.PLoS One. 2011 Apr 5;6(4):e14778. doi: 10.1371/journal.pone.0014778. PLoS One. 2011. PMID: 21483667 Free PMC article.
References
-
- Bucher P. Weight matrix descriptions of four eukaryotic RNA polymerase ii promoter elements derived from 502 unrelated promoter sequences. J. Mol. Biol. 1990;212:563–578. - PubMed
-
- Marsan L., Sagot M.F. Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J. Comput. Biol. 2000;7:345–362. - PubMed
-
- Vilo J., Brazma A., Jonassen I., Robinson A., Ukonnen E. Mining for putative regulatory elements in the yeast genome using gene expression data. Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology; San Diego, CA: AAAI Press; 2000. pp. 384–394. - PubMed
-
- Barash Y., Elidan G., Friedman N., Kaplan T. Modelling dependencies in protein–DNA binding sites. Proceedings of Seventh Annual International Conference on Computational Molecular Biology (RECOMB); New York, NY: ACM Press; 2003. pp. 28–37.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources