Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2016 Sep 30;44(17):8020-40.
doi: 10.1093/nar/gkw608. Epub 2016 Jul 22.

An integrated, structure- and energy-based view of the genetic code

Affiliations
Review

An integrated, structure- and energy-based view of the genetic code

Henri Grosjean et al. Nucleic Acids Res. .

Abstract

The principles of mRNA decoding are conserved among all extant life forms. We present an integrative view of all the interaction networks between mRNA, tRNA and rRNA: the intrinsic stability of codon-anticodon duplex, the conformation of the anticodon hairpin, the presence of modified nucleotides, the occurrence of non-Watson-Crick pairs in the codon-anticodon helix and the interactions with bases of rRNA at the A-site decoding site. We derive a more information-rich, alternative representation of the genetic code, that is circular with an unsymmetrical distribution of codons leading to a clear segregation between GC-rich 4-codon boxes and AU-rich 2:2-codon and 3:1-codon boxes. All tRNA sequence variations can be visualized, within an internal structural and energy framework, for each organism, and each anticodon of the sense codons. The multiplicity and complexity of nucleotide modifications at positions 34 and 37 of the anticodon loop segregate meaningfully, and correlate well with the necessity to stabilize AU-rich codon-anticodon pairs and to avoid miscoding in split codon boxes. The evolution and expansion of the genetic code is viewed as being originally based on GC content with progressive introduction of A/U together with tRNA modifications. The representation we present should help the engineering of the genetic code to include non-natural amino acids.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Circular representation of the genetic code emphasizing the inherent regularities of the decoding recognition process. The codons containing solely G = C pairs at the first two positions are at the top, those containing solely A–U pairs at the bottom, and those with mixed pairs of G = C and A–U either at the first or second pair of the codon/anticodon helix in the middle at the right and left. Thick red lines separate the three main regions. The red arrow indicates the direction of rotation for C1, G1, U1, A1 and the blue arrows the direction of rotation for C2, G2, U2, A2 on the right and left parts of the wheel. The amino acids coded by unsplit 4-codon boxes are indicated in red and those by split 2:2- and 3:1-codon boxes, together with the usual stop codons, are indicated in black. Throughout, the codon positions are numbered B1-B2-B3 and the anticodon nucleotides B34-B35-B36, both from 5′ to 3′.
Figure 2.
Figure 2.
On the circular representation for the code are given the Turner energies calculated for the first two base pairs of the codon/anticodon helix, B1-B2 paired with B36-B35. Most generally, 5′-B1-B2-B3-3′ pairs to 5′-B34-B35-B36-3′, forming pairs B1-B36, B2-B35 and B3-B34. The free energy values for dimers are in kcal/mole and taken from (35). In the scheme (A) at the top left, the black square symbolizes the H-bonds between A–U or G = C and the grey vertical rectangles the stacking interactions between the base pairs. The average values are given in the red rectangles within the codon wheel: it is −3.1 kcal/mole for the top codon boxes, −2.2 kcal/mole for the middle codon boxes and −1.0 kcal/mole for the bottom ones, with therefore 1.0 kcal/mole difference between the three groups. The reported errors on the free energies are less than 0.1 kcal/mole. The top four groups of un-split family codons boxes will be called ‘STRONG’ codon boxes, the middle ones encompassing split and un-split codon boxes are designated ‘INTERMEDIATE’ codon boxes, and the last all split family codon boxes at the bottom as ‘WEAK’ codon boxes. The values in parentheses represent the same calculations with the consideration for the third base pair assumed to be Watson–Crick and without nucleotide modifications (schematized in (B) at the upper right corner). Although such calculations are not biologically meaningful, they may be relevant when considering the evolution of the decoding system. The differences between the averaged values are of the order of −1.5 kcal/mole and around 3.0 kcal/mole between the STRONG and WEAK codon boxes. The same calculations were done with the third base pair either a U3oG34 or a G3oU34 (schemes (C) and (D) at bottom left and right corners respectively). Note that with B2-B35 being either a G = C or C = G pair, the energies are 2.0 kcal/mole (depending on GoU34 or UoG34, respectively) more stable than with B2-B35 being a A–U or U–A pair. Also, whatever the nature of the B2-B35 pair, G3oU34 is always less stable than U3oG34 by about half a kcal/mole. Correct decoding depends on formation of a short double helix-like structure between the three bases of the codon and the anticodon (symbolized by the dashed red cylinder).
Figure 3.
Figure 3.
The experimentally identified interactions that are variable between the codon/anticodon base pairs, the ribosomal grip and the anticodon loop nucleotides are shown on the circular representation of the codons. The constant contacts are not shown. The weak contacts involving C–H bonds are also not shown. The rRNA nucleotides are in red and the anticodon loop nucleotides in black. In plain black boxes are the tRNA intra-anticodon loop interactions, in plain red boxes are the interactions of the ribosomal grip with, in dashed red boxes, the contact occurring only in the top half of the wheel. Purine bases at position 2 of codons are circled in green to emphasize the fact that they do not contact the corresponding tRNAs.
Figure 4.
Figure 4.
Structural characteristics of anticodon hairpin of tRNA. Identity and relative frequency of a nucleotide (including modified ones) are obtained from compilation of 382 elongator tRNAs belonging to the three domains of life (123 from Eubacteria, 55 from Archaea, 204 from Eukaryota). Initiator tRNAMet, tRNAs coding for Pyl and Sec, all tRNAs from mitochondria/plastids and bacteriophages/virus were excluded from the analysis. The data set comprises tRNA sequences present in the Modomics database (14) that currently contains all sequences available by 2009 in the tRNA database as in ref. (76). Analysis was performed using the software tool tRNAmodviz (http://genesilico.pl/trnamodviz). Distributions of nucleotide residues at each position of the hairpin (positions 27 to 43) are visualized as a pea chart, of which the color code is indicated at the top right corner of the figure. Universal numbering system is used (144). Acronyms of modified nucleotides present in certain isoacceptor tRNA are indicated outside the anticodon hairpin (‘b’ means bacteria, ‘e’, eukaryote and ‘a’, archaea). Those present in the proximal ‘extended anticodon’ (in green square box, see also text) are in grey background. Modified nucleotides present at positions 37 and 34 are listed in Figure 5A and B, respectively. The red arrow pointing from B34 of anticodon to B3 of the codon and the plateau of B35/B2 and the red arrow from B37 to the plateau of B36/B1 symbolize the stabilizing effects of B34/B37 on codon–anticodon pairings.
Figure 5.
Figure 5.
Phylogenetic distribution of modified and hypermodified nucleosides at (A) B37 and (B) B34 of anticodon hairpin of tRNA from the 3 domains of life. All acronyms are those conventionally used, the corresponding chemical structures, full scientific names and chemical characteristics of most of them can be found in (145) (see also in: Modomics, http://modomics.genesilico.pl/). Only a few tRNA from Archaea have been sequenced so far, therefore information concerning this domain (especially for U34 modifications) is incomplete. Meanings of ‘b’, ‘a’, ‘e’ is the same as above in Figure 4 with that of ‘o’ corresponding to organelles. The data set comprises tRNA sequences present in the Modomics database (14).
Figure 6.
Figure 6.
Architecture of the proximal extended anticodon loop of Escherichia coli tRNAs. On the circular representation for the code are given (A) the base pairs B31-B39, (B) the base opposition B32/B38 and the (C) identity of purine-37 found in the anticodon loop of the various tRNA species corresponding to each of the decoding boxes for the 20 amino acids. Modified nucleotides are indicated in red. Non-random usage of base pairings B31-B39 is apparent, with a frequent use of A31-Ψ39 in tRNA in the ‘weak’ decoding boxes. Likewise, the B32/B38 positions are more variable in the top of the wheel. Modification of B37 is clearly dependent on B1, thus of B36 of anticodon that has to base pair with B1 codon. Only in a limited number of isoacceptor species, another modified B37 (m2A or m6t6A) are found. Same analysis for tRNAs from H. volcanii, S. cerevisiae, M. capricolum and human mitochondria are shown in Supplementary Figure S3. (D) Chemical structures of the modifications found at position B37. Notice the presence of amino acid as part of ct6A modification.
Figure 7.
Figure 7.
Identity of nucleotides at the first anticodon position (B34) of E. coli tRNAs. The global codon usage (after (146)) is inserted between the circle for the third base and that for the amino acid type. Conventional one letter code for amino acid is used. For the sake of clarity, the distribution of G34, I34 and C34 derivatives are shown in (A), and U34 derivatives in (B). Modified G/C/I-34 are indicated in red. All Q-containing tRNAs and those containing modified C* are found in tRNA belonging to split 2:2-codon boxes. For the modified U34 (shown in (C)), two chemically distinguishable types of modified residues are found, one harboring an oxyacetic acid group (sometimes methylated) at the C5 atom of uracil (cmo5U or mcmo5U), are indicated as green. These U34 derivatives are found only in isoacceptor tRNA belonging to the unsplit 4-codon family boxes. The second type of modified U34 derivatives harbors a methylaminomethyl (sometimes carboxymethylated) group at the C5 atom of uracil (mnm5U or cmnm5U). They are indicated as blue, some of which are also hypermodified into 2-thiolated derivatives (s2U*) or methylated on the 2′-hydroxyl ribose (U*m). They are found in all split 2:2-codon boxes, and in the Arg/Gly 4-codon boxes (after (88)). Same analysis for tRNAs from H. volcanii, S. cerevisiae, M. capricolum and human mitochondria are shown in Supplementary Figure S4. (D) Chemical structures of the modifications found at positions B34. Notice the presence of amino acid as part of a few modifications.
Figure 8.
Figure 8.
Modulation of codon–anticodon binding according to G+C (STRONG) or A+U (WEAK) binding capability of selected E. coli tRNA species. In both cases, the anticodon hairpin is schematically represented with all the nucleotides of the 5′ branch in continuous stacking up to B34 and the complementary nucleotides of the 3′ branch in continuous stacking up to B32. The U33 turn is indicated with its links to R35 and/or Y36 (underlined). At positions B32 and B38, various combinations of base opposition are found (boxed with dashed lines), the most frequent ones being indicated in bold letters. In red are the modified nucleotides; underlining emphasizes that the chemical adduct reinforces the stacking power of the base with the neighboring nucleotides. Modulation of the strength of codon–anticodon binding occurs by the anticodon loop constraints that mainly depends on the choice of the B32-B38 base opposition, additional interactions with the conserved U33 and the identity of the chemical adducts on B37 and B34 that stabilize the B36-B1 and the B35-B2 interactions, respectively (schematized by red arrows). The number 2 with an asterisk for Pro, means that information came only from tDNA sequence, the corresponding maturated transcripts have not been sequenced yet. On the right of each diagram, an approximate energy scheme is displayed (in a blue rectangle) with S standing for STRONG, I for INTERMEDIATE, and W for WEAK. Same analysis for tRNAs from H. volcanii, S. cerevisiae, M. capricolum and human mitochondria are shown in Supplementary Figure S6.
Figure 9.
Figure 9.
Deviations from the standard, almost universal genetic code. The most frequent deviations concern terminator codons unexpectedly efficiently translated as sense codons for amino acids Trp, Gly, Cys, Gln, Tyr, Leu or Ala. Conventional one letter code for amino acid is used. In red are shown amino acid reassignments observed in nuclear genomes while in green are those related to mitochondrial genomes. The less frequently encountered reassignment of sense codons for a non-standard amino acid have been observed essentially in mitochondrial genome (indicated in green), except in one case (a Leu-codon coding for Ser in Candida and Debaryomyces species (83). For more details and references of original papers (in addition to those cited in text), (1,82,147). The codons remaining unassigned and possibly playing the role of occasional alternative stop codon in certain organisms are indicated by a red (nuclear) or green (mitochondria) circle around B3, dotted line means avoided codons, plain lines means unassigned, possibly stop codons. The red arrows between 2 boxes indicate a switches between amino acids within 2:2 decoding boxes. Outside the circle, are indicated the reassignment of stop codon UGA and UAG into Sec (21st proteogenic amino acid) and Pyl (22nd amino acid), respectively (2). The most frequent codons reassignments mostly occur within the blue dotted boxes.
Figure 10.
Figure 10.
Hypothetical stepwise evolution of the genetic code and translation machinery in Bacteria, Eukarya, Archaea and mitochondria. (A) Starting from highly G+C rich small RNA pieces and a few abiotic amino acids, the primordial translation system evolved by extending its coding capacity by progressive introduction of A and U in both the template and decoder RNAs (see also (122)). The final decoding machinery as we know to date (and illustrated on the left panel of the figure) results from a long and complex intricate stepwise coevolution with the emergence of metabolically generated amino acids (up to 20), duplication and speciation of proto-tRNAs with distinct anticodons (up to 40–45 to date), post-transcriptional modification enzymes (up to more than 100 known to date), new amino acid tRNA synthetases (up to 20), complexification of the ribosomal architecture (rRNAs and r-proteins) and the introduction of additional protein factors allowing the extension and ultimate tuning the efficacy and accuracy of the decoding capability of 61/62 sense (cognate and near cognate) codons with in addition 2 to 3 terminators for 20 proteogenomic natural amino acids. Pyrrolysine and selenocysteine have been excluded from our analysis, as well as the situation of the special tRNAMet involved in the initiation of protein synthesis that obviously arose later during cell evolution. Emphasis is given to the importance of B34 modifications (indicated in red) that allows the segregation of the A/U-rich 4-codon boxes into split decoding boxes 2:2 and 3:1 with subsequent additional amino acids to enter the code. (B) Following genomic selection conditions such as directional constraints or mutational pressure on codons (strong G/C as in M. luteus (148) or strong A/T combined with drastic genome size reduction as in mammalian mitochondria (57,60) or the minimalist bacteria M. capricolum (58,59,149), simplifications of the usual translational decoding system are evident, while preserving the split decoding boxes because of the need to encode for 20 amino acids. Less tRNA species with distinct anticodons (isodecoders) are required (22 for human mitochondria, 28 in M. capricolum and 29 in M. luteus, again not including tRNAMet initiator). However, remarkably the only remaining B34 modifications found in natural tRNAs are those related to the split 2:2 and/or 3:1 decoding boxes. More information in relation to the corresponding tRNA modification enzymes of a given tRNA repertoire in the subgroup of Mollicutes (mainly mycoplasmas) can be found in (150). The situation of M. luteus is remarkable for its total absence of U34-containing tRNAs while in mycoplasma and mitochondria U34-containing tRNA are critical. Codon usage correlates with tRNA repertoire and amino acid type (,–142). Symbols for each acronym of the most important modified nucleotides are indicated within the figure. In parenthesis b, e, a, m means bacteria, eukaryotes, archaea and mitochondria respectively. More details about their chemical structures are shows in Figure 7 and Supplementary Figure S4. Within each decoding boxes, blue arrows correspond to the most dominant codon:anticodon pairs. Only B34 are indicated. When symbols of B34 modification are in parenthesis, it means that the unmodified version of B34 is used only in certain decoding boxes, while the modified version(s) is (are) used in other decoding boxes (for details see in Figure 7 for E. coli and Supplementary Figure S4 for H. volcanii, S. cerevisiae, M. capricolum and human mitochondria, respectively). When B3 is in bold, it means that the codon usage corresponding to the particular codon is dominant over the other near cognate codons of the same 4- or 2- decoding boxes, while B34 indicated in regular italics correspond to rare codons (for details see as above, Figure 7 for E. coli and Supplementary Figure S4 for the other organisms analyzed).
Figure 11.
Figure 11.
Summary scheme illustrating the mapping of the energetics and evolutionary history on the wheel organization of the genetic code. Right vertical arrows: from bottom to top, there is an increase in the strengths of the networking interactions that is coupled with an increase of base modifications at B34 and B37 from top to bottom. Left vertical arrows at the right and left sides: the evolution from the primordial G/C-rich to A/U-rich codon/anticodons triplets required base modifications and the related metabolic enzymatic activities.
Figure 12.
Figure 12.
Summary figure: an approximate and simplified energy scheme at the A site decoding center illustrates how the favorable and costly contributions top the free energies and compensate to maintain a smooth and regular translation process with minor final variations in free energy. On the wheel representation, the energies at the left can be mapped. However, not all energies can be mapped; for example, the interactions between the ribosome and parts other than the anticodon hairpin, the conformational distortions or alternative states of tRNAs, and the energies associated with ribosomal movements.

Similar articles

Cited by

References

    1. Watanabe K., Yokobori S. tRNA modification and genetic code variations in animal mitochondria. J. Nucleic Acids. 2011;2011:623095. - PMC - PubMed
    1. Ling J., O'Donoghue P., Söll D. Genetic code flexibility in microorganisms: novel mechanisms and impact on physiology. Nat. Rev. Microbiol. 2015;13:707–721. - PMC - PubMed
    1. Bezerra A.R., Guimaraes A.R., Santos M.A. Non-standard genetic codes define new concepts for protein engineering. Life (Basel) 2015;5:1610–1628. - PMC - PubMed
    1. Nirenberg M., Leder P., Bernfield M., Brimacombe R., Trupin J., Rottman F., O'Neal C. RNA codewords and protein synthesis, VII. On the general nature of the RNA code. Proc. Natl. Acad. Sci. U.S.A. 1965;53:1161–1168. - PMC - PubMed
    1. Crick F.H. The origin of the genetic code. J. Mol. Biol. 1968;38:367–379. - PubMed

Publication types