Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Oct;30(10):2249-62.
doi: 10.1093/molbev/mst131. Epub 2013 Aug 1.

Linking great apes genome evolution across time scales using polymorphism-aware phylogenetic models

Affiliations

Linking great apes genome evolution across time scales using polymorphism-aware phylogenetic models

Nicola De Maio et al. Mol Biol Evol. 2013 Oct.

Abstract

The genomes of related species contain valuable information on the history of the considered taxa. Great apes in particular exhibit variation of evolutionary patterns along their genomes. However, the great ape data also bring new challenges, such as the presence of incomplete lineage sorting and ancestral shared polymorphisms. Previous methods for genome-scale analysis are restricted to very few individuals or cannot disentangle the contribution of mutation rates and fixation biases. This represents a limitation both for the understanding of these forces as well as for the detection of regions affected by selection. Here, we present a new model designed to estimate mutation rates and fixation biases from genetic variation within and between species. We relax the assumption of instantaneous substitutions, modeling substitutions as mutational events followed by a gradual fixation. Hence, we straightforwardly account for shared ancestral polymorphisms and incomplete lineage sorting. We analyze genome-wide synonymous site alignments of human, chimpanzee, and two orangutan species. From each taxon, we include data from several individuals. We estimate mutation rates and GC-biased gene conversion intensity. We find that both mutation rates and biased gene conversion vary with GC content. We also find lineage-specific differences, with weaker fixation biases in orangutan species, suggesting a reduced historical effective population size. Finally, our results are consistent with directional selection acting on coding sequences in relation to exonic splicing enhancers.

Keywords: biased gene conversion; coding sequence evolution; mutation rates; phylogenetics-population genetics model; primates evolution; rate heterogeneity.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Parameter estimation with PoMo. (A) Data from synonymous sites of each of the four species considered are collected. For each species, 10 alleles are sampled (the figure depicts data from a single site). (B) Each site of each species is associated with a state in PoMo10 according to its allele counts. (C–D) Given a set of parameter values, the likelihood of each site is calculated. (C) Transition probabilities between nodes are calculated according to the PoMo10 rate matrix. For simplification, the figure shows only two alleles, while the full model has four alleles (supplementary table S9, Supplementary Material online). (D) Following the Felsenstein pruning algorithm (Felsenstein 1981), we sum probabilities over all combinations of states at inner nodes. (E) The likelihood of all sites is combined, and the process is iterated with different parameter values until we find those that maximize the likelihood. These values (mutation rates, fixation biases, and root nucleotide frequencies) are our final estimates.
Fig. 2.
Fig. 2.
Performance for simulated data. Mutational and frequency parameters simulated were as estimated in the highest GC-content bin (see Materials and Methods). Intensity of selection for GC versus AT was set to formula image. On X axis is the number of sites in the data set used. Each box plot represents 10 simulations. The errors in the estimation, on the Y axis, were calculated as the Euclidean distance between the vector of estimated parameters and the true values, normalized by the Euclidean norm of the true vector. (A) Error in estimation of: fixation biases (6 entries vector, one for each substitution type, blue box plot), non-CpG mutation rates (6 entries, red), and CpG hypermutability (single-entry vector, orange). (B) Error in estimation of: branch lengths (green), ancestral nucleotide frequencies (pink), and equilibrium nucleotide frequencies (yellow). (C) Estimates of: GC versus AT fixation bias (blue), GC* in sites not preceded by C and not followed by G (yellow), GC* in sites not preceded by C and followed by G (green), and GC* in sites preceded by C and followed by G (red). The horizontal dashed lines represent the respective true values used for the simulations.
Fig. 3.
Fig. 3.
Estimates of mutation rates in great apes. (A) Estimates of relative mutation rates by Lynch (2010) in humans (blue) and PoMo10 on great ape data (red). μAC stands for mutation rate from A to C, etc. Values on the Y axis represent mutation rates normalized by formula image. (B) Estimates of relative mutation rates by Duret and Arndt (2008) in human-chimp (blue), CpG-PoMo10 in great apes without first exons (red), and CpG-PoMo10 on first exons only (green). hμCT represents the hypermutability from C to T and from G to A in CpG context. Error bars in (A and B) show the profile likelihood 95% confidence intervals. (C) Mutation rates from A and C nucleotides (red) compared with mutation rates from G and T (blue). In both cases, we refer to the nucleotide on the sense strand. We paired reverse-complement mutation types to remark strand-asymmetries. All rates are estimated with asy-CpG-PoMo10b on the whole data (see Materials and Methods).
Fig. 4.
Fig. 4.
Variation in fixation biases and mutation rates with base composition. Exon alignments were binned in 6 classes according to GC content. On Y axis, we show parameter estimates for each bin, on X axis are bins ordered by increasing GC content. Error bars show the profile likelihood 95% confidence intervals. If not visible, confidence intervals are too small. (A) Estimation of mutation rates with CpG-PoMo10. Values on the Y axis represent mutation rates normalized by formula image. μAC stands for mutation rate from A to C, etc. hμCT stands for CpG hypermutability. (B) Estimation of fixation biases with the strand-specific asy-CpG-PoMo10b. GC–sAT represents the apparent selective advantage of GC versus AT, sC–sA between C and A, sG–sA between G and A, and sT–sA between T and A.
Fig. 5.
Fig. 5.
Variation in equilibrium and ancestral GC content with base composition. Exon alignments were binned in 6 classes according to GC content. On Y axis, we show great apes root (yellow), observed (purple), and equilibrium (blue) GC content in each bin (on X axis are bins ordered by increasing GC content) using CpG-PoMo10.
Fig. 6.
Fig. 6.
Variation within exons. Synonymous sites were binned according to their position within exons. The first and the last exon of each gene were excluded. The first 5 synonymous sites in each exon were assigned to the 5′-bin, the last 5 to the 3′-bin, the remaining to the central bin. On the X axis is the bin considered, on the Y axis are shown, respectively: (A) root, present, and equilibrium GC content, estimated with PoMo10; (B) fixation biases estimated with asy-CpG-PoMo10b. In 5′- and 3′-bins the number of sites is formula image, in the other two formula image. Error bars show the profile likelihood 95% confidence intervals.

References

    1. 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. - PMC - PubMed
    1. Akashi H, Ko W, Piao S, John A, Goel P, Lin C, Vitins A. Molecular evolution in the Drosophila melanogaster species subgroup: frequent parameter fluctuations on the timescale of molecular divergence. Genetics. 2006;172:1711–1726. - PMC - PubMed
    1. Alvarez-Valin F, Clay O, Cruveiller S, Bernardi G. Inaccurate reconstruction of ancestral GC levels creates a vanishing isochores effect. Mol Phylogenet Evol. 2004;31:788–793. - PubMed
    1. Auton A, Fledel-Alon A, Pfeifer S, et al. (23 co-authors) A fine-scale chimpanzee genetic map from population sequencing. Science. 2012;336:193–198. - PMC - PubMed
    1. Belle E, Duret L, Galtier N, Eyre-Walker A. The decline of isochores in mammals: an assessment of the GC content variation along the mammalian phylogeny. J Mol Evol. 2004;58:653–660. - PubMed

Publication types