DNA motif alignment by evolving a population of Markov chains
- PMID: 19208112
- PMCID: PMC2648755
- DOI: 10.1186/1471-2105-10-S1-S13
DNA motif alignment by evolving a population of Markov chains
Abstract
Background: Deciphering cis-regulatory elements or de novo motif-finding in genomes still remains elusive although much algorithmic effort has been expended. The Markov chain Monte Carlo (MCMC) method such as Gibbs motif samplers has been widely employed to solve the de novo motif-finding problem through sequence local alignment. Nonetheless, the MCMC-based motif samplers still suffer from local maxima like EM. Therefore, as a prerequisite for finding good local alignments, these motif algorithms are often independently run a multitude of times, but without information exchange between different chains. Hence it would be worth a new algorithm design enabling such information exchange.
Results: This paper presents a novel motif-finding algorithm by evolving a population of Markov chains with information exchange (PMC), each of which is initialized as a random alignment and run by the Metropolis-Hastings sampler (MHS). It is progressively updated through a series of local alignments stochastically sampled. Explicitly, the PMC motif algorithm performs stochastic sampling as specified by a population-based proposal distribution rather than individual ones, and adaptively evolves the population as a whole towards a global maximum. The alignment information exchange is accomplished by taking advantage of the pooled motif site distributions. A distinct method for running multiple independent Markov chains (IMC) without information exchange, or dubbed as the IMC motif algorithm, is also devised to compare with its PMC counterpart.
Conclusion: Experimental studies demonstrate that the performance could be improved if pooled information were used to run a population of motif samplers. The new PMC algorithm was able to improve the convergence and outperformed other popular algorithms tested using simulated and biological motif sequences.
Figures


Similar articles
-
A Monte Carlo EM algorithm for de novo motif discovery in biomolecular sequences.IEEE/ACM Trans Comput Biol Bioinform. 2009 Jul-Sep;6(3):370-86. doi: 10.1109/TCBB.2008.103. IEEE/ACM Trans Comput Biol Bioinform. 2009. PMID: 19644166
-
SEAM: a Stochastic EM-type Algorithm for Motif-finding in biopolymer sequences.J Bioinform Comput Biol. 2007 Feb;5(1):47-77. doi: 10.1142/s0219720007002527. J Bioinform Comput Biol. 2007. PMID: 17477491
-
Bayesian restoration of a hidden Markov chain with applications to DNA sequencing.J Comput Biol. 1999 Summer;6(2):261-77. doi: 10.1089/cmb.1999.6.261. J Comput Biol. 1999. PMID: 10421527
-
Bayesian models and Markov chain Monte Carlo methods for protein motifs with the secondary characteristics.J Comput Biol. 2005 Sep;12(7):952-70. doi: 10.1089/cmb.2005.12.952. J Comput Biol. 2005. PMID: 16201915 Review.
-
A comparative study on computational two-block motif detection: algorithms and applications.Mol Pharm. 2008 Jan-Feb;5(1):3-16. doi: 10.1021/mp7001126. Epub 2007 Dec 13. Mol Pharm. 2008. PMID: 18076137 Review.
References
-
- Wang L, Jiang T. On the complexity of multiple sequence alignment. J Comput Biol. 1994;1:337–348. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources