. 2008 Dec 27;363(1512):3941-53.

doi: 10.1098/rstb.2008.0175.

Bayesian analysis of amino acid substitution models

John P Huelsenbeck¹, Paul Joyce, Clemens Lakner, Fredrik Ronquist

Affiliations

PMID: 18852098
PMCID: PMC2607418
DOI: 10.1098/rstb.2008.0175

Bayesian analysis of amino acid substitution models

John P Huelsenbeck et al. Philos Trans R Soc Lond B Biol Sci. 2008.

. 2008 Dec 27;363(1512):3941-53.

doi: 10.1098/rstb.2008.0175.

Authors

John P Huelsenbeck¹, Paul Joyce, Clemens Lakner, Fredrik Ronquist

Affiliation

¹ Department of Integrative Biology, University of California, Berkeley, 3060 VLSB #3140, Berkeley, CA 94720-3140, USA. johnh@berkeley.edu

PMID: 18852098
PMCID: PMC2607418
DOI: 10.1098/rstb.2008.0175

Abstract

Models of amino acid substitution present challenges beyond those often faced with the analysis of DNA sequences. The alignments of amino acid sequences are often small, whereas the number of parameters to be estimated is potentially large when compared with the number of free parameters for nucleotide substitution models. Most approaches to the analysis of amino acid alignments have focused on the use of fixed amino acid models in which all of the potentially free parameters are fixed to values estimated from a large number of sequences. Often, these fixed amino acid models are specific to a gene or taxonomic group (e.g. the Mtmam model, which has parameters that are specific to mammalian mitochondrial gene sequences). Although the fixed amino acid models succeed in reducing the number of free parameters to be estimated--indeed, they reduce the number of free parameters from approximately 200 to 0--it is possible that none of the currently available fixed amino acid models is appropriate for a specific alignment. Here, we present four approaches to the analysis of amino acid sequences. First, we explore the use of a general time reversible model of amino acid substitution using a Dirichlet prior probability distribution on the 190 exchangeability parameters. Second, we then explore the behaviour of prior probability distributions that are'centred' on the rates specified by the fixed amino acid model. Third, we consider a mixture of fixed amino acid models. Finally, we consider constraints on the exchangeability parameters as partitions,similar to how nucleotide substitution models are specified, and place a Dirichlet process prior model on all the possible partitioning schemes.

PubMed Disclaimer

Figures

**Figure 1**
A frequency histogram of the potential scale reduction statistics $(\sqrt{\hat{R}})$ for the parameters examined in this study.

**Figure 2**
The marginal prior and posterior probability distribution for four of the exchangeability parameters for the HIV *pol* alignment. The Kullback–Leibler divergence, I, is shown for each parameter and measures the dissimilarity between the prior and posterior probability distributions. Note that the Kullback–Leibler divergence is small for the M↔Y parameter where the prior and posterior distributions are similar. The data are more informative for the other parameters shown, and the Kullback–Leibler divergence is correspondingly larger.

**Figure 3**
The mean of the marginal posterior probability distribution [ $[E (θ | X)]$ ] and the Kullback–Leibler divergence (I) for the 190 exchangeability parameters for each of the eight alignments examined in this study. The 190 exchangeability parameters are ordered along the x-axes.

**Figure 4**
The prior (dashed line) and posterior (solid line) marginal probability density for two of the 190 exchangeability parameters for analyses of the HIV alignment. The y-axis is the marginal posterior probability density of the rate.

**Figure 5**
The Kullback–Leibler divergences (y-axes) of the 190 exchangeability parameters (arranged along the x-axes) under four different centred prior distributions for the HIV alignment.

**Figure 6**
The marginal posterior probability distribution of the I↔V exchangeability parameter for the HIV alignment when Χ varies. The y-axis is the marginal posterior probability density of the rate. Red line, 0.1; blue line, 1; light green line, 100; orange line, 190; light blue line, 500; green line, 1000; yellow line, 10 000.

**Figure 7**
The prior and posterior probability distributions (upside down and right side up, respectively) for the number of exchangeability parameter groups for the analyses where the concentration parameter is fixed such that the prior mean of the number of substitution categories is 2 (i.e. E(K)=2). (a) *Drosophila* *adh*; (b) vertebrate β-globin; (c) Leviviridae coat; (d) Japanese encephalitis *env*; (e) flavivirus; (f) influenza; (g) HIV *pol* and (h) Leviviridae replicase.

**Figure 8**
The mean partitions of the exchangeability parameters for each of the analyses. For each alignment, three mean partitions are shown, with the top most having a prior mean of 2 and the bottom most having a prior mean of 10.

See this image and copyright information in PMC

References

1. Adachi J, Hasegawa M. MOLPHY v. 2.3: programs for molecular phylogenetics based on maximum likelihood. Comput. Sci. Monogr. 1996;28:1–150.
1. Adachi J, Waddell P, Martin W, Hasegawa M. Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA. J. Mol. Evol. 2000;50:348–358. - PubMed
1. Antoniak C.E. Mixtures of Dirichlet processes with applications to non-parametric problems. Ann. Stat. 1974;2:1152–1174. doi:10.1214/aos/1176342871 - DOI
1. Bell E.T. Exponential numbers. Am. Math. Monthly. 1934;41:411–419. doi:10.2307/2300300 - DOI
1. Bishop M.J, Friday A.E. Tetrapod relationships: the molecular evidence. In: Patterson C, editor. Molecules and morphology in evolution: conflict or compromise? Cambridge University Press; Cambridge, UK: 1987. pp. 123–140.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Bayesian analysis of amino acid substitution models

Affiliation

Bayesian analysis of amino acid substitution models

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources