MUSA: a parameter free algorithm for the identification of biologically significant motifs
- PMID: 17068086
- DOI: 10.1093/bioinformatics/btl537
MUSA: a parameter free algorithm for the identification of biologically significant motifs
Abstract
Motivation: The ability to identify complex motifs, i.e. non-contiguous nucleotide sequences, is a key feature of modern motif finders. Addressing this problem is extremely important, not only because these motifs can accurately model biological phenomena but because its extraction is highly dependent upon the appropriate selection of numerous search parameters. Currently available combinatorial algorithms have proved to be highly efficient in exhaustively enumerating motifs (including complex motifs), which fulfill certain extraction criteria. However, one major problem with these methods is the large number of parameters that need to be specified.
Results: We propose a new algorithm, MUSA (Motif finding using an UnSupervised Approach), that can be used either to autonomously find over-represented complex motifs or to estimate search parameters for modern motif finders. This method relies on a biclustering algorithm that operates on a matrix of co-occurrences of small motifs. The performance of this method is independent of the composite structure of the motifs being sought, making few assumptions about their characteristics. The MUSA algorithm was applied to two datasets involving the bacterium Pseudomonas putida KT2440. The first one was composed of 70 sigma(54)-dependent promoter sequences and the second dataset included 54 promoter sequences of up-regulated genes in response to phenol, as suggested by quantitative proteomics. The results obtained indicate that this approach is very effective at identifying complex motifs of biological significance.
Availability: The MUSA algorithm is available upon request from the authors, and will be made available via a Web based interface.
Similar articles
-
SPACER: identification of cis-regulatory elements with non-contiguous critical residues.Bioinformatics. 2007 Apr 15;23(8):1029-31. doi: 10.1093/bioinformatics/btm041. Bioinformatics. 2007. PMID: 17470480
-
Finding motifs from all sequences with and without binding sites.Bioinformatics. 2006 Sep 15;22(18):2217-23. doi: 10.1093/bioinformatics/btl371. Epub 2006 Jul 26. Bioinformatics. 2006. PMID: 16870937
-
Regulatory motif finding by logic regression.Bioinformatics. 2004 Nov 1;20(16):2799-811. doi: 10.1093/bioinformatics/bth333. Epub 2004 May 27. Bioinformatics. 2004. PMID: 15166027
-
An extension and novel solution to the (l,d)-motif challenge problem.Genome Inform. 2004;15(2):63-71. Genome Inform. 2004. PMID: 15706492 Review.
-
Finding sequence motifs in prokaryotic genomes--a brief practical guide for a microbiologist.Brief Bioinform. 2009 Sep;10(5):525-36. doi: 10.1093/bib/bbp032. Epub 2009 Jun 24. Brief Bioinform. 2009. PMID: 19553402 Review.
Cited by
-
Direct vs 2-stage approaches to structured motif finding.Algorithms Mol Biol. 2012 Aug 21;7(1):20. doi: 10.1186/1748-7188-7-20. Algorithms Mol Biol. 2012. PMID: 22908910 Free PMC article.
-
Review of Different Sequence Motif Finding Algorithms.Avicenna J Med Biotechnol. 2019 Apr-Jun;11(2):130-148. Avicenna J Med Biotechnol. 2019. PMID: 31057715 Free PMC article. Review.
-
Models incorporating chromatin modification data identify functionally important p53 binding sites.Nucleic Acids Res. 2013 Jun;41(11):5582-93. doi: 10.1093/nar/gkt260. Epub 2013 Apr 17. Nucleic Acids Res. 2013. PMID: 23599002 Free PMC article.
-
Transcriptional profiling of Arabidopsis root hairs and pollen defines an apical cell growth signature.BMC Plant Biol. 2014 Aug 1;14:197. doi: 10.1186/s12870-014-0197-3. BMC Plant Biol. 2014. PMID: 25080170 Free PMC article.
-
Yeast IME2 functions early in meiosis upstream of cell cycle-regulated SBF and MBF targets.PLoS One. 2012;7(2):e31575. doi: 10.1371/journal.pone.0031575. Epub 2012 Feb 29. PLoS One. 2012. PMID: 22393365 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Molecular Biology Databases