Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 3;14(8):evac115.
doi: 10.1093/gbe/evac115.

Stop Codon Usage as a Window into Genome Evolution: Mutation, Selection, Biased Gene Conversion and the TAG Paradox

Affiliations

Stop Codon Usage as a Window into Genome Evolution: Mutation, Selection, Biased Gene Conversion and the TAG Paradox

Alexander T Ho et al. Genome Biol Evol. .

Abstract

Protein coding genes terminate with one of three stop codons (TAA, TGA, or TAG) that, like synonymous codons, are not employed equally. With TGA and TAG having identical nucleotide content, analysis of their differential usage provides an unusual window into the forces operating on what are ostensibly functionally identical residues. Across genomes and between isochores within the human genome, TGA usage increases with G + C content but, with a common G + C → A + T mutation bias, this cannot be explained by mutation bias-drift equilibrium. Increased usage of TGA in G + C-rich genomes or genomic regions is also unlikely to reflect selection for the optimal stop codon, as TAA appears to be universally optimal, probably because it has the lowest read-through rate. Despite TAA being favored by selection and mutation bias, as with codon usage bias G + C pressure is the prime determinant of between-species TGA usage trends. In species with strong G + C-biased gene conversion (gBGC), such as mammals and birds, the high usage and conservation of TGA is best explained by an A + T → G + C repair bias. How to explain TGA enrichment in other G + C-rich genomes is less clear. Enigmatically, across bacterial and archaeal species and between human isochores TAG usage is mostly unresponsive to G + C pressure. This unresponsiveness we dub the TAG paradox as currently no mutational, selective, or gBGC model provides a well-supported explanation. That TAG does increase with G + C usage across eukaryotes makes the usage elsewhere yet more enigmatic. We suggest resolution of the TAG paradox may provide insights into either an unknown but common selective preference (probably at the DNA/RNA level) or an unrecognized complexity to the action of gBGC.

Keywords: genome evolution; molecular evolution; stop codon read-through; stop codon usage; translation termination; translational read-through.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The basic mechanism of stop codon recognition by class I release factors. (A) The translating ribosome decodes coding sequence and recruits cognate amino-acylated tRNAs (brown) to build the growing polypeptide amino acid chain (small, colored circles). (B) The stop codon (UGA in this example, but typically UAA, UGA, or UAG) is recognized by, and becomes bound to, a class I release factor: RF1 or RF2 in bacterial, eRF1 in eukaryotic, or aRF1 in archaeal genomes (orange). (C) The binding between the release factor and stop codon begins a cascade leading to polypeptide release via the action of a class II release factor (not shown). Note that stop codons function in mRNA and hence genomic T (thymine) is replaced by U (uracil).
Fig. 2.
Fig. 2.
Stop codon usage (A) among 644 bacterial genomes, (B) 106 archaeal genomes, (C) 50 eukaryote genomes, and (D) among human isochores. TAA usage is negatively correlated with G + C3 content in all four analyses (Spearman’s rank; P < 2.2 × 10−16, rho = −0.92 for bacteria; P < 2.2 × 10−16, rho = −0.89 for archaea; P = 4.2 × 10−7, rho = −0.66 for eukaryotes; P < 2.2 × 10−16, rho = −0.92 within the human genome). TGA usage is positively correlated with G + C3 content in all four analyses (Spearman’s rank; P < 2.2 × 10−16, rho = 0.88 for bacteria; P < 2.2 × 10−16, rho = 0.76 for archaea; P = 0.0035, rho = 0.41 for eukaryotes; P < 2.2 × 10−16, rho = 0.98 within the human genome). TAG usage is uncorrelated with G + C3 content in bacteria (Spearman’s rank; P = 0.48, rho = −0.03). TAG usage is positively correlated with G + C3 content, but with lower absolute usage than TGA, in archaea (Spearman’s rank; P = 1.1 × 10−7, rho = 0.49), eukaryotes (Spearman’s rank; P = 1.1 × 10−6, rho = 0.64), and within the human genome (Spearman’s rank; P = 0.0020, rho = 0.88). Figure adapted from Ho and Hurst (2021). Species lists and underlying data can be found in the Supplementary material.
Fig. 3.
Fig. 3.
Stop codon usage normalized to the mean (A) between 644 bacterial genomes, (B) 106 archaeal genomes, (C) 50 eukaryote genomes, and (D) between human isochores. Normalization to the mean has no effect on the correlation statistics presented in fig. 2. Normalized TAA usage is negatively correlated with G + C3 content in all four analyses (Spearman’s rank; P < 2.2 × 10−16, rho =−0.92 for bacteria; P < 2.2 × 10−16, rho = −0.89 for archaea; P = 4.2 × 10−7, rho = −0.66 for eukaryotes; P < 2.2 × 10−16, rho = −0.92 within the human genome). Normalized TGA usage is positively correlated with G + C3 content in all four analyses (Spearman’s rank; P < 2.2 × 10−16, rho = 0.88 for bacteria; P < 2.2 × 10−16, rho = 0.76 for archaea; P = 0.0035, rho = 0.41 for eukaryotes; P < 2.2 × 10−16, rho = 0.98 within the human genome). Normalized TAG usage is uncorrelated with G + C3 content in bacteria (Spearman’s rank; P = 0.48, rho = −0.03). TAG usage is positively correlated with G + C3 content, but with lower absolute usage than TGA, in archaea (Spearman’s rank; P = 1.1 × 10−7, rho = 0.49), eukaryotes (Spearman’s rank; P = 1.1 × 10−6, rho = 0.64), and within the human genome (Spearman’s rank; P = 0.0020, rho = 0.88). Figure adapted from Ho and Hurst (2021). Species lists and underlying data can be found in the Supplementary material.
Fig. 4.
Fig. 4.
Stop codon frequencies (relative to the usage of all stops) normalized to the mean at the canonical stop site, in the 5′ UTR, and in the 3′ UTR at 10 equal-sized bins of various intronic G + C contents in the genome. Normalized TAA frequency is negatively correlated with intronic G + C content in all 3 sequences (Spearman’s rank; all P < 2.2 × 10−16, rho = −0.99 at the canonical stop site and in 5′ UTR sequences, rho = −1 in 3′ UTR sequences). TGA is positively correlated with intronic G + C content in all 3 sequences (Spearman’s rank; all P < 2.2 × 10−16, rho = 0.99 at the canonical stop site and in 5′ UTR sequences, rho = 1 in 3′ UTR sequences). TAG usage is positively correlated with intronic G + C content at the canonical stop site (Spearman’s rank; P = 0.0014, rho = 0.89) but is uncorrelated with intronic G + C content in both 5′ (Spearman’s rank; P = 0.61, rho = 0.19) and 3′ UTR sequences (Spearman’s rank; P = 0.10, rho = 0.55). Figure adapted from Ho and Hurst (2022). Underlying data can be found in the Supplementary material.
Fig. 5.
Fig. 5.
Simulated equilibrium (A) G + C content and (B) TGA usage plotted against the current G + C content of the windows from which the mutation spectrum was estimated. Panel A is reproduced with permission from Smith et al. (2018) (the original figure is available open access at: https://doi.org/10.1371/journal.pgen.1007254.g004) and shows equilibrium G + C estimates from three sources of human de novo mutations. Panel B is reproduced from Ho and Hurst (2022) and illustrates equilibrium TGA usage (relative to TAA and TAG usage) estimated from the Jonsson et al. (2017) dataset of human de novo mutations. This is done either employing a 4 × 4 mononucleotide mutational matrix or from a 16 × 16 dinucleotide matrix with Markov process to define k-mer equilibrium content.
Fig. 6.
Fig. 6.
Rates of point mutations leading to (A) TAA<->TGA and (B) TAA<->TAG trinucleotide changes derived from the Jonsson et al. (2017) dataset (n = 108,778). De novo mutations were partitioned according to their surrounding (10 kb) G + C content into 10 equal bins. Mutations causing TAA->TGA (Spearman’s rank; P = 0.51, rho = −0.24, n = 10) and TAA->TAG (Spearman’s rank; P = 0.51, rho = 0.24, n = 10) changes are invariant to G + C pressure. Net TGA and net TAG refer to the TAA->TGA rate minus the TGA->TAA rate and the TAA->TAG rate minus the TAG->TAA rate, respectively. Net TAG gain is invariant to G + C pressure (Spearman’s rank; P = 0.97, rho = 0.018, n = 10). Net TGA gain is negatively correlated with G + C pressure (Spearman’s rank; P = 0.028, rho = −0.71, n = 10). Underlying data can be found in the Supplementary material.
Fig. 7.
Fig. 7.
The mechanistic basis of translational read-through. (A) Canonical termination occurs when the stop codon is recognized by its cognate release factor. Only coding sequence is translated to build the polypeptide amino acid chain. (B) Translational read-through occurs when the stop codon is missed by the termination machinery, often due to the erroneous binding of a near-cognate tRNA to the stop codon (Roy et al. 2015; Beznoskova et al. 2016). This results in the translation of 3′ UTR sequence until the next in-frame stop codon or until the ribosome reaches the polyA+ tail, triggering nonstop decay.

Similar articles

Cited by

References

    1. Abrahams L, Hurst LD. 2018. Refining the ambush hypothesis: evidence that GC- and AT-rich bacteria employ different frameshift defence strategies. Genome Biol Evol. 10:1153–1173. - PMC - PubMed
    1. Adachi M, Cavalcanti AR. 2009. Tandem stop codons in ciliates that reassign stop codons. J Mol Evol. 68:424–431. - PubMed
    1. Akashi H, Schaeffer SW. 1997. Natural selection and the frequency distributions of “silent” DNA polymorphism in Drosophila. Genetics 146:295–307. - PMC - PubMed
    1. Alkalaeva E, Mikhailova T. 2017. Reassigning stop codons via translation termination: how a few eukaryotes broke the dogma. Bioessays 39:1600213. - PubMed
    1. Andersson SGE, Kurland CG. 1990. Codon preferences in free-living microorganisms. Microbiol Rev. 54:198–210. - PMC - PubMed

Publication types