NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence
- PMID: 15760844
- PMCID: PMC1064142
- DOI: 10.1093/nar/gki282
NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence
Abstract
NestedMICA is a new, scalable, pattern-discovery system for finding transcription factor binding sites and similar motifs in biological sequences. Like several previous methods, NestedMICA tackles this problem by optimizing a probabilistic mixture model to fit a set of sequences. However, the use of a newly developed inference strategy called Nested Sampling means NestedMICA is able to find optimal solutions without the need for a problematic initialization or seeding step. We investigate the performance of NestedMICA in a range scenario, on synthetic data and a well-characterized set of muscle regulatory regions, and compare it with the popular MEME program. We show that the new method is significantly more sensitive than MEME: in one case, it successfully extracted a target motif from background sequence four times longer than could be handled by the existing program. It also performs robustly on synthetic sequences containing multiple significant motifs. When tested on a real set of regulatory sequences, NestedMICA produced motifs which were good predictors for all five abundant classes of annotated binding sites.
Figures
References
-
- Bucher P. Weight matrix descriptions of four eukaryotic RNA polymerase ii promoter elements derived from 502 unrelated promoter sequences. J. Mol. Biol. 1990;212:563–578. - PubMed
-
- Marsan L., Sagot M.F. Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J. Comput. Biol. 2000;7:345–362. - PubMed
-
- Vilo J., Brazma A., Jonassen I., Robinson A., Ukonnen E. Mining for putative regulatory elements in the yeast genome using gene expression data. Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology; San Diego, CA: AAAI Press; 2000. pp. 384–394. - PubMed
-
- Barash Y., Elidan G., Friedman N., Kaplan T. Modelling dependencies in protein–DNA binding sites. Proceedings of Seventh Annual International Conference on Computational Molecular Biology (RECOMB); New York, NY: ACM Press; 2003. pp. 28–37.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
