Statistical alignment with a sequence evolution model allowing rate heterogeneity along the sequence
- PMID: 19407352
- DOI: 10.1109/TCBB.2007.70246
Statistical alignment with a sequence evolution model allowing rate heterogeneity along the sequence
Abstract
We present a stochastic sequence evolution model to obtain alignments and estimate mutation rates between two homologous sequences. The model allows two possible evolutionary behaviors along a DNA sequence in order to determine conserved regions and take its heterogeneity into account. In our model, the sequence is divided into slow and fast evolution regions. The boundaries between these sections are not known. It is our aim to detect them. The evolution model is based on a fragment insertion and deletion process working on fast regions only and on a substitution process working on fast and slow regions with different rates. This model induces a pair hidden Markov structure at the level of alignments, thus making efficient statistical alignment algorithms possible. We propose two complementary estimation methods, namely, a Gibbs sampler for Bayesian estimation and a stochastic version of the EM algorithm for maximum likelihood estimation. Both algorithms involve the sampling of alignments. We propose a partial alignment sampler, which is computationally less expensive than the typical whole alignment sampler. We show the convergence of the two estimation algorithms when used with this partial sampler. Our algorithms provide consistent estimates for the mutation rates and plausible alignments and sequence segmentations on both simulated and real data.
Similar articles
-
Bayesian coestimation of phylogeny and sequence alignment.BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83. BMC Bioinformatics. 2005. PMID: 15804354 Free PMC article.
-
Statistical alignment based on fragment insertion and deletion models.Bioinformatics. 2003 Mar 1;19(4):490-9. doi: 10.1093/bioinformatics/btg026. Bioinformatics. 2003. PMID: 12611804
-
Using evolutionary Expectation Maximization to estimate indel rates.Bioinformatics. 2005 May 15;21(10):2294-300. doi: 10.1093/bioinformatics/bti177. Epub 2005 Feb 24. Bioinformatics. 2005. PMID: 15731213 Free PMC article.
-
Are you my mother? Bayesian phylogenetic inference of recombination among putative parental strains.Appl Bioinformatics. 2003;2(3):131-44. Appl Bioinformatics. 2003. PMID: 15130798 Review.
-
Multiple sequence alignment: in pursuit of homologous DNA positions.Genome Res. 2007 Feb;17(2):127-35. doi: 10.1101/gr.5232407. Genome Res. 2007. PMID: 17272647 Review.
Cited by
-
A model of evolution and structure for multiple sequence alignment.Philos Trans R Soc Lond B Biol Sci. 2008 Dec 27;363(1512):3913-9. doi: 10.1098/rstb.2008.0170. Philos Trans R Soc Lond B Biol Sci. 2008. PMID: 18852103 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources