Comparative Study

. 2005 Mar 10;33(5):1445-53.

doi: 10.1093/nar/gki282. Print 2005.

NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence

Thomas A Down¹, Tim J P Hubbard

Affiliations

PMID: 15760844
PMCID: PMC1064142
DOI: 10.1093/nar/gki282

Comparative Study

NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence

Thomas A Down et al. Nucleic Acids Res. 2005.

. 2005 Mar 10;33(5):1445-53.

doi: 10.1093/nar/gki282. Print 2005.

Authors

Thomas A Down¹, Tim J P Hubbard

Affiliation

¹ Wellcome Trust Sanger Institute, Hinxton Cambridge, CB10 1SA, UK. td2@sanger.ac.uk

PMID: 15760844
PMCID: PMC1064142
DOI: 10.1093/nar/gki282

Abstract

NestedMICA is a new, scalable, pattern-discovery system for finding transcription factor binding sites and similar motifs in biological sequences. Like several previous methods, NestedMICA tackles this problem by optimizing a probabilistic mixture model to fit a set of sequences. However, the use of a newly developed inference strategy called Nested Sampling means NestedMICA is able to find optimal solutions without the need for a problematic initialization or seeding step. We investigate the performance of NestedMICA in a range scenario, on synthetic data and a well-characterized set of muscle regulatory regions, and compare it with the popular MEME program. We show that the new method is significantly more sensitive than MEME: in one case, it successfully extracted a target motif from background sequence four times longer than could be handled by the existing program. It also performs robustly on synthetic sequences containing multiple significant motifs. When tested on a real set of regulatory sequences, NestedMICA produced motifs which were good predictors for all five abundant classes of annotated binding sites.

PubMed Disclaimer

Figures

**Figure 1**
The zero-or-one occurrences per sequence (ZOOPS) sequence mixture model (SMM), represented as a hidden Markov model (HMM). The states labelled m1–m4 are responsible for modelling the interesting motif, while the other states model the non-interesting remainder of the sequence.

**Figure 2**
A multiple-uncounted SMM containing two motifs. The black dots are silent states, which are not responsible for modelling any part of the sequence.

**Figure 3**
Likelihoods of a set of test sequences, given mosaic background models of various orders and class numbers.

**Figure 4**
(a) The original HLF motif from JASPAR. (b) Results for searching for HLF in a set of 150 base sequences using MEME. (c) MEME with 200 base sequences. (d) NestedMICA with 600 base sequences. (e) NestedMICA with 700 base sequences.

**Figure 5**
A selection of mammalian JASPAR weight matrices that are used for synthetic data tests.

**Figure 6**
ROC curves for the best matches to the SRE sites in the NestedMICA and MEME results.

**Figure 7**
The MEF2 motif derived from curated sites, and the corresponding high-scoring motifs from NestedMICA and MEME.

**Figure 8**
The SRE motif derived from curated sites, and the corresponding high-scoring motifs from NestedMICA and MEME.

See this image and copyright information in PMC

Cited by

MOST+: A de novo motif finding approach combining genomic sequence and heterogeneous genome-wide signatures.
Zhang Y, He Y, Zheng G, Wei C. Zhang Y, et al. BMC Genomics. 2015;16 Suppl 7(Suppl 7):S13. doi: 10.1186/1471-2164-16-S7-S13. Epub 2015 Jun 11. BMC Genomics. 2015. PMID: 26099518 Free PMC article.
Specificity of Notch pathway activation: twist controls the transcriptional output in adult muscle progenitors.
Bernard F, Krejci A, Housden B, Adryan B, Bray SJ. Bernard F, et al. Development. 2010 Aug;137(16):2633-42. doi: 10.1242/dev.053181. Epub 2010 Jul 7. Development. 2010. PMID: 20610485 Free PMC article.
Discovery of regulatory elements is improved by a discriminatory approach.
Valen E, Sandelin A, Winther O, Krogh A. Valen E, et al. PLoS Comput Biol. 2009 Nov;5(11):e1000562. doi: 10.1371/journal.pcbi.1000562. Epub 2009 Nov 13. PLoS Comput Biol. 2009. PMID: 19911049 Free PMC article.
Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses.
Zhang C, Sashittal P, Xiang M, Zhang Y, Kazi A, El-Kebir M. Zhang C, et al. Mol Biol Evol. 2022 Jul 2;39(7):msac133. doi: 10.1093/molbev/msac133. Mol Biol Evol. 2022. PMID: 35700225 Free PMC article.
Genome-wide analysis of the binding of the Hox protein Ultrabithorax and the Hox cofactor Homothorax in Drosophila.
Choo SW, White R, Russell S. Choo SW, et al. PLoS One. 2011 Apr 5;6(4):e14778. doi: 10.1371/journal.pone.0014778. PLoS One. 2011. PMID: 21483667 Free PMC article.

See all "Cited by" articles

References

1. Stormo G.D., Schneider T.D., Gold L., Ehrenfeucht A. Use of the ‘perceptron’ algorithm to distinguish translational initiation sites in E.coli. Nucleic Acids Res. 1982;10:2997–3011. - PMC - PubMed
1. Bucher P. Weight matrix descriptions of four eukaryotic RNA polymerase ii promoter elements derived from 502 unrelated promoter sequences. J. Mol. Biol. 1990;212:563–578. - PubMed
1. Marsan L., Sagot M.F. Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J. Comput. Biol. 2000;7:345–362. - PubMed
1. Vilo J., Brazma A., Jonassen I., Robinson A., Ukonnen E. Mining for putative regulatory elements in the yeast genome using gene expression data. Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology; San Diego, CA: AAAI Press; 2000. pp. 384–394. - PubMed
1. Barash Y., Elidan G., Friedman N., Kaplan T. Modelling dependencies in protein–DNA binding sites. Proceedings of Seventh Annual International Conference on Computational Molecular Biology (RECOMB); New York, NY: ACM Press; 2003. pp. 28–37.

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

WT_/Wellcome Trust/United Kingdom

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence

Affiliation

NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources