Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jun 21;108(25):10231-6.
doi: 10.1073/pnas.1016719108. Epub 2011 Jun 6.

Explaining complex codon usage patterns with selection for translational efficiency, mutation bias, and genetic drift

Affiliations

Explaining complex codon usage patterns with selection for translational efficiency, mutation bias, and genetic drift

Premal Shah et al. Proc Natl Acad Sci U S A. .

Abstract

The genetic code is redundant with most amino acids using multiple codons. In many organisms, codon usage is biased toward particular codons. Understanding the adaptive and nonadaptive forces driving the evolution of codon usage bias (CUB) has been an area of intense focus and debate in the fields of molecular and evolutionary biology. However, their relative importance in shaping genomic patterns of CUB remains unsolved. Using a nested model of protein translation and population genetics, we show that observed gene level variation of CUB in Saccharomyces cerevisiae can be explained almost entirely by selection for efficient ribosomal usage, genetic drift, and biased mutation. The correlation between observed codon counts within individual genes and our model predictions is 0.96. Although a variety of factors shape patterns of CUB at the level of individual sites within genes, our results suggest that selection for efficient ribosome usage is a central force in shaping codon usage at the genomic scale. In addition, our model allows direct estimation of codon-specific mutation rates and elongation times and can be readily applied to any organism with high-throughput expression datasets. More generally, we have developed a natural framework for integrating models of molecular processes to population genetics models to quantitatively estimate parameters underlying fundamental biological processes, such a protein translation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Effect of varying relative mutation rate (μi/μj), elongation time (Δtij), and protein production rate (φ) on the expected codon frequencies (E[f]) in a hypothetical two-codon amino acid. (A) Effect of changing μi/μj on E[f] with φ. Solid lines represent the codon with a longer elongation time t1, and dotted lines represent the codon with a shorter elongation time t2. Mutation bias has a greater effect on E[f] at low φ, whereas at very high φ, the E[f] of codons converge to the same values, irrespective of μi/μj. (B) Effect of changing titj on their expected frequencies E[f] with respect to φ. Solid lines represent the codon with a lower relative mutation rate μ1, and dotted lines represent the codon with a higher mutation rate μ2. Differences in elongation times between the two codons titj has little effect on E[f] at low φ. However, at high φ, as titj changes, so does the difference in their expected frequencies E[f].
Fig. 2.
Fig. 2.
Observed and predicted changes in codon frequencies with gene expression, specifically protein production rate φ. AS correspond to specific amino acids, where codons ending in A or T are shown in shades of blue and codons ending in G or C are shown in shades of red. Solid dots and vertical bars represent mean ±1 SD of observed codon frequencies of genes in a given bin. The expected codon frequencies under our model are represented by solid lines. (T) Histogram of genes in each bin. We used k − 1 codons of an amino acid with k codons in estimating correlation coefficients. ρM represents the Pearson correlation between the mean of observed codon frequencies within a bin and predicted codon frequencies at mean φ value. ρc represents the Pearson correlation between observed codon counts and predicted codon counts of all genes at their individual φ value.
Fig. 3.
Fig. 3.
Correlation between observed codon counts and predicted codon counts of individual genes. We used codon counts of k − 1 codons of an amino acid with k codons. Ignoring Met and Trp (one-codon amino acids) and splitting Ser into two blocks of four and two codons, there are 19 unique amino acid sets. Hence, the number of data points used is 4,674 × (59 − 19) = 186,960. We find a very high correlation (ρ = 0.959, P < 1015) between our model predictions and observed counts. (Inset) Distribution of correlation coefficients at the level of individual amino acids, indicating that our high correlation is not biased by specific amino acids and that we have a high correlation across all amino acids.
Fig. 4.
Fig. 4.
Correlation between our model-based estimates of Δtij with Δtij estimated using tRNA gene copy numbers. We find a strong correlation (ρ = 0.801, P < 10−9) between our model estimates and estimates of Δtij based on tRNA gene copy numbers, indicating that our estimates can be related to other biological estimates, such as tRNA abundances, directly.

References

    1. Ikemura T. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: A proposal for a synonymous codon choice that is optimal for the E. coli translational system. J Mol Biol. 1981;151:389–409. - PubMed
    1. Dong H, Nilsson L, Kurland CG. Co-variation of tRNA abundance and codon usage in Escherichia coli at different growth rates. J Mol Biol. 1996;260:649–663. - PubMed
    1. Akashi H. Synonymous codon usage in Drosophila melanogaster: Natural selection and translational accuracy. Genetics. 1994;136:927–935. - PMC - PubMed
    1. Drummond DA, Wilke CO. The evolutionary consequences of erroneous protein synthesis. Nat Rev Genet. 2009;10:715–724. - PMC - PubMed
    1. Gilchrist MA. Combining models of protein translation and population genetics to predict protein production rates from codon usage patterns. Mol Biol Evol. 2007;24:2362–2372. - PubMed

Publication types