Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Dec 27;363(1512):3941-53.
doi: 10.1098/rstb.2008.0175.

Bayesian analysis of amino acid substitution models

Affiliations

Bayesian analysis of amino acid substitution models

John P Huelsenbeck et al. Philos Trans R Soc Lond B Biol Sci. .

Abstract

Models of amino acid substitution present challenges beyond those often faced with the analysis of DNA sequences. The alignments of amino acid sequences are often small, whereas the number of parameters to be estimated is potentially large when compared with the number of free parameters for nucleotide substitution models. Most approaches to the analysis of amino acid alignments have focused on the use of fixed amino acid models in which all of the potentially free parameters are fixed to values estimated from a large number of sequences. Often, these fixed amino acid models are specific to a gene or taxonomic group (e.g. the Mtmam model, which has parameters that are specific to mammalian mitochondrial gene sequences). Although the fixed amino acid models succeed in reducing the number of free parameters to be estimated--indeed, they reduce the number of free parameters from approximately 200 to 0--it is possible that none of the currently available fixed amino acid models is appropriate for a specific alignment. Here, we present four approaches to the analysis of amino acid sequences. First, we explore the use of a general time reversible model of amino acid substitution using a Dirichlet prior probability distribution on the 190 exchangeability parameters. Second, we then explore the behaviour of prior probability distributions that are'centred' on the rates specified by the fixed amino acid model. Third, we consider a mixture of fixed amino acid models. Finally, we consider constraints on the exchangeability parameters as partitions,similar to how nucleotide substitution models are specified, and place a Dirichlet process prior model on all the possible partitioning schemes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A frequency histogram of the potential scale reduction statistics (Rˆ) for the parameters examined in this study.
Figure 2
Figure 2
The marginal prior and posterior probability distribution for four of the exchangeability parameters for the HIV pol alignment. The Kullback–Leibler divergence, I, is shown for each parameter and measures the dissimilarity between the prior and posterior probability distributions. Note that the Kullback–Leibler divergence is small for the MY parameter where the prior and posterior distributions are similar. The data are more informative for the other parameters shown, and the Kullback–Leibler divergence is correspondingly larger.
Figure 3
Figure 3
The mean of the marginal posterior probability distribution [[E(θ|X)]] and the Kullback–Leibler divergence (I) for the 190 exchangeability parameters for each of the eight alignments examined in this study. The 190 exchangeability parameters are ordered along the x-axes.
Figure 4
Figure 4
The prior (dashed line) and posterior (solid line) marginal probability density for two of the 190 exchangeability parameters for analyses of the HIV alignment. The y-axis is the marginal posterior probability density of the rate.
Figure 5
Figure 5
The Kullback–Leibler divergences (y-axes) of the 190 exchangeability parameters (arranged along the x-axes) under four different centred prior distributions for the HIV alignment.
Figure 6
Figure 6
The marginal posterior probability distribution of the IV exchangeability parameter for the HIV alignment when Χ varies. The y-axis is the marginal posterior probability density of the rate. Red line, 0.1; blue line, 1; light green line, 100; orange line, 190; light blue line, 500; green line, 1000; yellow line, 10 000.
Figure 7
Figure 7
The prior and posterior probability distributions (upside down and right side up, respectively) for the number of exchangeability parameter groups for the analyses where the concentration parameter is fixed such that the prior mean of the number of substitution categories is 2 (i.e. E(K)=2). (a) Drosophila adh; (b) vertebrate β-globin; (c) Leviviridae coat; (d) Japanese encephalitis env; (e) flavivirus; (f) influenza; (g) HIV pol and (h) Leviviridae replicase.
Figure 8
Figure 8
The mean partitions of the exchangeability parameters for each of the analyses. For each alignment, three mean partitions are shown, with the top most having a prior mean of 2 and the bottom most having a prior mean of 10.

Similar articles

Cited by

References

    1. Adachi J, Hasegawa M. MOLPHY v. 2.3: programs for molecular phylogenetics based on maximum likelihood. Comput. Sci. Monogr. 1996;28:1–150.
    1. Adachi J, Waddell P, Martin W, Hasegawa M. Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA. J. Mol. Evol. 2000;50:348–358. - PubMed
    1. Antoniak C.E. Mixtures of Dirichlet processes with applications to non-parametric problems. Ann. Stat. 1974;2:1152–1174. doi:10.1214/aos/1176342871 - DOI
    1. Bell E.T. Exponential numbers. Am. Math. Monthly. 1934;41:411–419. doi:10.2307/2300300 - DOI
    1. Bishop M.J, Friday A.E. Tetrapod relationships: the molecular evidence. In: Patterson C, editor. Molecules and morphology in evolution: conflict or compromise? Cambridge University Press; Cambridge, UK: 1987. pp. 123–140.

Publication types

LinkOut - more resources