A Monte Carlo EM algorithm for de novo motif discovery in biomolecular sequences
- PMID: 19644166
- DOI: 10.1109/TCBB.2008.103
A Monte Carlo EM algorithm for de novo motif discovery in biomolecular sequences
Abstract
Motif discovery methods play pivotal roles in deciphering the genetic regulatory codes (i.e., motifs) in genomes as well as in locating conserved domains in protein sequences. The Expectation Maximization (EM) algorithm is one of the most popular methods used in de novo motif discovery. Based on the position weight matrix (PWM) updating technique, this paper presents a Monte Carlo version of the EM motif-finding algorithm that carries out stochastic sampling in local alignment space to overcome the conventional EM's main drawback of being trapped in a local optimum. The newly implemented algorithm is named as Monte Carlo EM Motif Discovery Algorithm (MCEMDA). MCEMDA starts from an initial model, and then it iteratively performs Monte Carlo simulation and parameter update until convergence. A log-likelihood profiling technique together with the top-k strategy is introduced to cope with the phase shifts and multiple modal issues in motif discovery problem. A novel grouping motif alignment (GMA) algorithm is designed to select motifs by clustering a population of candidate local alignments and successfully applied to subtle motif discovery. MCEMDA compares favorably to other popular PWM-based and word enumerative motif algorithms tested using simulated (l, d)-motif cases, documented prokaryotic, and eukaryotic DNA motif sequences. Finally, MCEMDA is applied to detect large blocks of conserved domains using protein benchmarks and exhibits its excellent capacity while compared with other multiple sequence alignment methods.
Similar articles
-
Memetic algorithms for de novo motif-finding in biomedical sequences.Artif Intell Med. 2012 Sep;56(1):1-17. doi: 10.1016/j.artmed.2012.04.002. Epub 2012 May 20. Artif Intell Med. 2012. PMID: 22613029
-
A profile-based deterministic sequential Monte Carlo algorithm for motif discovery.Bioinformatics. 2008 Jan 1;24(1):46-55. doi: 10.1093/bioinformatics/btm543. Epub 2007 Nov 17. Bioinformatics. 2008. PMID: 18024972
-
On the Monte-Carlo expectation maximization for finding motifs in DNA sequences.IEEE J Biomed Health Inform. 2015 Mar;19(2):677-86. doi: 10.1109/JBHI.2014.2322694. Epub 2014 May 8. IEEE J Biomed Health Inform. 2015. PMID: 24833606
-
Bayesian models and Markov chain Monte Carlo methods for protein motifs with the secondary characteristics.J Comput Biol. 2005 Sep;12(7):952-70. doi: 10.1089/cmb.2005.12.952. J Comput Biol. 2005. PMID: 16201915 Review.
-
Multiple sequence alignments.Curr Opin Struct Biol. 2005 Jun;15(3):261-6. doi: 10.1016/j.sbi.2005.04.002. Curr Opin Struct Biol. 2005. PMID: 15963889 Review.
Cited by
-
PairMotif: A new pattern-driven algorithm for planted (l, d) DNA motif search.PLoS One. 2012;7(10):e48442. doi: 10.1371/journal.pone.0048442. Epub 2012 Oct 31. PLoS One. 2012. PMID: 23119020 Free PMC article.
-
An Affinity Propagation-Based DNA Motif Discovery Algorithm.Biomed Res Int. 2015;2015:853461. doi: 10.1155/2015/853461. Epub 2015 Aug 10. Biomed Res Int. 2015. PMID: 26347887 Free PMC article.
-
Review of Different Sequence Motif Finding Algorithms.Avicenna J Med Biotechnol. 2019 Apr-Jun;11(2):130-148. Avicenna J Med Biotechnol. 2019. PMID: 31057715 Free PMC article. Review.
-
MCOIN: a novel heuristic for determining transcription factor binding site motif width.Algorithms Mol Biol. 2013 Jun 27;8(1):16. doi: 10.1186/1748-7188-8-16. Algorithms Mol Biol. 2013. PMID: 23806098 Free PMC article.
-
A Review on Planted (l, d) Motif Discovery Algorithms for Medical Diagnose.Sensors (Basel). 2022 Feb 5;22(3):1204. doi: 10.3390/s22031204. Sensors (Basel). 2022. PMID: 35161949 Free PMC article. Review.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources