Bayesian analysis of amino acid substitution models
- PMID: 18852098
- PMCID: PMC2607418
- DOI: 10.1098/rstb.2008.0175
Bayesian analysis of amino acid substitution models
Abstract
Models of amino acid substitution present challenges beyond those often faced with the analysis of DNA sequences. The alignments of amino acid sequences are often small, whereas the number of parameters to be estimated is potentially large when compared with the number of free parameters for nucleotide substitution models. Most approaches to the analysis of amino acid alignments have focused on the use of fixed amino acid models in which all of the potentially free parameters are fixed to values estimated from a large number of sequences. Often, these fixed amino acid models are specific to a gene or taxonomic group (e.g. the Mtmam model, which has parameters that are specific to mammalian mitochondrial gene sequences). Although the fixed amino acid models succeed in reducing the number of free parameters to be estimated--indeed, they reduce the number of free parameters from approximately 200 to 0--it is possible that none of the currently available fixed amino acid models is appropriate for a specific alignment. Here, we present four approaches to the analysis of amino acid sequences. First, we explore the use of a general time reversible model of amino acid substitution using a Dirichlet prior probability distribution on the 190 exchangeability parameters. Second, we then explore the behaviour of prior probability distributions that are'centred' on the rates specified by the fixed amino acid model. Third, we consider a mixture of fixed amino acid models. Finally, we consider constraints on the exchangeability parameters as partitions,similar to how nucleotide substitution models are specified, and place a Dirichlet process prior model on all the possible partitioning schemes.
Figures








Similar articles
-
CodonTest: modeling amino acid substitution preferences in coding sequences.PLoS Comput Biol. 2010 Aug 19;6(8):e1000885. doi: 10.1371/journal.pcbi.1000885. PLoS Comput Biol. 2010. PMID: 20808876 Free PMC article.
-
A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process.Mol Biol Evol. 2004 Jun;21(6):1095-109. doi: 10.1093/molbev/msh112. Epub 2004 Mar 10. Mol Biol Evol. 2004. PMID: 15014145
-
Antibody-specific model of amino acid substitution for immunological inferences from alignments of antibody sequences.Mol Biol Evol. 2015 Mar;32(3):806-19. doi: 10.1093/molbev/msu340. Epub 2014 Dec 21. Mol Biol Evol. 2015. PMID: 25534034 Free PMC article.
-
Compositional adjustment of Dirichlet mixture priors.J Comput Biol. 2010 Dec;17(12):1607-20. doi: 10.1089/cmb.2010.0117. J Comput Biol. 2010. PMID: 21128852 Free PMC article.
-
An efficient deep learning method for amino acid substitution model selection.J Evol Biol. 2025 Jan 3;38(1):129-139. doi: 10.1093/jeb/voae141. J Evol Biol. 2025. PMID: 39548851
Cited by
-
Phylogenetic characterization of transport protein superfamilies: superiority of SuperfamilyTree programs over those based on multiple alignments.J Mol Microbiol Biotechnol. 2011;21(3-4):83-96. doi: 10.1159/000334611. Epub 2012 Jan 31. J Mol Microbiol Biotechnol. 2011. PMID: 22286036 Free PMC article. Review.
-
Advantages of a mechanistic codon substitution model for evolutionary analysis of protein-coding sequences.PLoS One. 2011;6(12):e28892. doi: 10.1371/journal.pone.0028892. Epub 2011 Dec 29. PLoS One. 2011. PMID: 22220197 Free PMC article.
-
Maximum Likelihood Estimation of Species Trees from Gene Trees in the Presence of Ancestral Population Structure.Genome Biol Evol. 2020 Feb 1;12(2):3977-3995. doi: 10.1093/gbe/evaa022. Genome Biol Evol. 2020. PMID: 32022857 Free PMC article.
-
A duplicate gene rooting of seed plants and the phylogenetic position of flowering plants.Philos Trans R Soc Lond B Biol Sci. 2010 Feb 12;365(1539):383-95. doi: 10.1098/rstb.2009.0233. Philos Trans R Soc Lond B Biol Sci. 2010. PMID: 20047866 Free PMC article.
-
Benchmarking multi-rate codon models.PLoS One. 2010 Jul 21;5(7):e11587. doi: 10.1371/journal.pone.0011587. PLoS One. 2010. PMID: 20657773 Free PMC article.
References
-
- Adachi J, Hasegawa M. MOLPHY v. 2.3: programs for molecular phylogenetics based on maximum likelihood. Comput. Sci. Monogr. 1996;28:1–150.
-
- Adachi J, Waddell P, Martin W, Hasegawa M. Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA. J. Mol. Evol. 2000;50:348–358. - PubMed
-
- Antoniak C.E. Mixtures of Dirichlet processes with applications to non-parametric problems. Ann. Stat. 1974;2:1152–1174. doi:10.1214/aos/1176342871 - DOI
-
- Bell E.T. Exponential numbers. Am. Math. Monthly. 1934;41:411–419. doi:10.2307/2300300 - DOI
-
- Bishop M.J, Friday A.E. Tetrapod relationships: the molecular evidence. In: Patterson C, editor. Molecules and morphology in evolution: conflict or compromise? Cambridge University Press; Cambridge, UK: 1987. pp. 123–140.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources