Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Sep 28;101(39):13994-4001.
doi: 10.1073/pnas.0404142101. Epub 2004 Aug 3.

Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution

Affiliations

Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution

Dick G Hwang et al. Proc Natl Acad Sci U S A. .

Abstract

We describe a model of neutral DNA evolution that allows substitution rates at a site to depend on the two flanking nucleotides ("context"), the branch of the phylogenetic tree, and position within the sequence and implement it by using a flexible and computationally efficient Bayesian Markov chain Monte Carlo approach. We then apply this approach to characterize phylogenetic variation in context-dependent substitution patterns in a 1.7-megabase genomic region in 19 mammalian species. In contrast to other substitution types, CpG transition substitutions have accumulated in a relatively clock-like fashion. More broadly, our results support the notion that context-dependent DNA replication errors, cytosine deamination, and biased gene conversion are major sources of naturally occurring mutations whose relative contributions have varied in mammalian evolution as a result of changes in generation times, effective population sizes, and recombination rates.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Phylogenetic relationships (following ref. 31) among the 19 mammalian species analyzed. (Depicted branch lengths are arbitrary.) The branch shading patterns indicate a partitioning of the tree into five clades plus a group of three ancestral branches that was assumed in initial analyses of rate variation across the tree (see Results and Discussion). For reference in later figures, internal branches are labeled by number and external branches are referred to by species name.
Fig. 2.
Fig. 2.
Error distribution for 1,800 estimated substitution rates and branch lengths for the analysis with 14 substitution types (see Results and Discussion). The MCMC approach was used to estimate parameters from a simulated dataset, and the normalized errors were computed by dividing the difference between the estimate and the value used for the simulation by the estimated standard deviation. The error distribution is approximately standard normal (shown by curve), indicating that the MCMC approach is able to reliably estimate values and confidence intervals for a large number of parameters.
Fig. 3.
Fig. 3.
Comparison of context-dependent substitution rates in untranscribed regions in the rodent + rabbit and carnivore + artiodactyl + horse clades. Each point represents the rates in the two clades for a particular substitution wxyz. Rates were normalized such that within a clade the average rate, weighted by the observed frequencies of the trinucleotides wxy, is 1. Horizontal and vertical bars indicate 95% confidence intervals. The rates are broadly consistent between the clades, but groups of rates are shifted approximately parallel to the diagonal in log–log scale, suggesting that a multiplicative factor relates the rates within each group across clades. (If y = mx, then log y = log x + log m.) Similar trends were seen for other comparisons (see Fig. 10, which is published as supporting information on the PNAS web site). The color scheme reflects a grouping of substitutions into 14 types that explain much of the difference among clades (see Table 1).
Fig. 4.
Fig. 4.
Context-dependent substitution rates for untranscribed regions. Each point corresponds to the rate of a particular substitution wxyz, with the flanking context w·y indicated on the horizontal axis and xz indicated by color. Because the rates may vary across the tree, each rate shown is the average across the entire tree, scaled such that the average of all rates (weighted according to the frequency of each trinucleotide in all sequences) is 1. Vertical bars indicate 95% confidence intervals.
Fig. 5.
Fig. 5.
Deviation from clock-like behavior by substitution type. For each type (indicated by number as in Table 1), we computed the total branch length (in expected substitutions per site) from the root to each leaf and measured deviation from molecular clock behavior as the variance of the set of root-to-leaf distances (normalized so that the mean distance is 1 in each case). Vertical bars indicate 95% confidence intervals.
Fig. 6.
Fig. 6.
Tree shape varies according to substitution type, with NCG→ T the most clock-like. Branch lengths indicate the expected number of substitutions of the indicated types per pertinent site. Branch-length values with 95% confidence intervals for each substitution type and for all types combined are given in Figs. 11–25, which are published as supporting information on the PNAS web site. The trees are scaled so that the average root-to-leaf distance (indicated by the vertical line) is the same for all trees. See Fig. 1 for species labels.
Fig. 7.
Fig. 7.
Variation in relative substitution rates by branch. Branch labels are as indicated in Fig. 1; data for internal branches 1, 13, and 15 are omitted because of relatively large variance. Vertical bars indicate 95% confidence intervals. (A) NCG→ T/overall rate ratio. Because NCG→ T substitutions are relatively clock-like, this ratio provides a measure of deviation of the overall rate from clock-like behavior on each branch. (B) W→ S/S→ W rate ratio. This ratio is hypothesized to reflect biased gene conversion. Results were qualitatively similar when NCG→ T substitutions were excluded from the calculation of the S→ W rate (see Fig. 26, which is published as supporting information on the PNAS web site).
Fig. 8.
Fig. 8.
Transcriptional asymmetry by substitution type. For each type (indicated by number as in Table 1), we computed the fractional difference between its rate and that of its complement in transcribed regions. Rates were computed with respect to the transcribed strand and averaged over the entire tree. Vertical bars indicate 95% confidence intervals.
Fig. 9.
Fig. 9.
Relationship between transcriptional asymmetry and untranscribed substitution rate. Asymmetry is measured as in Fig. 8. Each point represents a particular substitution wxyz, with color indicating its type. NCG→ T substitutions, omitted here because of scale, do not display clear correlation. Horizontal and vertical bars indicate 95% confidence intervals.

References

    1. Gojobori, T., Li, W.-H. & Graur, D. (1982) J. Mol. Evol. 18, 360–369. - PubMed
    1. Li, W.-H., Wu, C.-I. & Luo, C.-C. (1984) J. Mol. Evol. 21, 58–71. - PubMed
    1. Blake, R. D., Hess, S. T. & Nicholson-Tuell, J. (1992) J. Mol. Evol. 34, 189–200. - PubMed
    1. Hess, S. T., Blake, J. D. & Blake, R. D. (1994) J. Mol. Biol. 236, 1022–1033. - PubMed
    1. Ehrlich, M. & Wang, R. Y. H. (1981) Science 212, 1350–1357. - PubMed

Publication types