Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Sep 25;124(18):10281-10362.
doi: 10.1021/acs.chemrev.3c00878. Epub 2024 Aug 9.

Cracking the Code: Reprogramming the Genetic Script in Prokaryotes and Eukaryotes to Harness the Power of Noncanonical Amino Acids

Affiliations
Review

Cracking the Code: Reprogramming the Genetic Script in Prokaryotes and Eukaryotes to Harness the Power of Noncanonical Amino Acids

Cosimo Jann et al. Chem Rev. .

Abstract

Over 500 natural and synthetic amino acids have been genetically encoded in the last two decades. Incorporating these noncanonical amino acids into proteins enables many powerful applications, ranging from basic research to biotechnology, materials science, and medicine. However, major challenges remain to unleash the full potential of genetic code expansion across disciplines. Here, we provide an overview of diverse genetic code expansion methodologies and systems and their final applications in prokaryotes and eukaryotes, represented by Escherichia coli and mammalian cells as the main workhorse model systems. We highlight the power of how new technologies can be first established in simple and then transferred to more complex systems. For example, whole-genome engineering provides an excellent platform in bacteria for enabling transcript-specific genetic code expansion without off-targets in the transcriptome. In contrast, the complexity of a eukaryotic cell poses challenges that require entirely new approaches, such as striving toward establishing novel base pairs or generating orthogonally translating organelles within living cells. We connect the milestones in expanding the genetic code of living cells for encoding novel chemical functionalities to the most recent scientific discoveries, from optimizing the physicochemical properties of noncanonical amino acids to the technological advancements for their in vivo incorporation. This journey offers a glimpse into the promising developments in the years to come.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following competing financial interest(s): E.A.L. holds several patents related to genetic code expansion and is a cofounder and consultant of Veraxa Biotech GmbH, a company specialized on generation of antibody drug conjugates via GCE.

Figures

Figure 1
Figure 1
GCE for amino acid recoding. The conserved genetic code of life on earth and its expansion via stop codon suppression. Recoding of a stop codon to a sense codon allows the co-translational incorporation of ncAAs into proteins, harboring diverse chemical structure and functionality, as illustrated with selected structures from Table S1.
Figure 2
Figure 2
Principle mechanisms of in vivo GCE. While natural translation terminates at stop codons, GCE systems enable stop codon suppression by introducing a heterologous tRNA synthetase/tRNA pair and by providing ncAAs. The anticodon of the heterologous tRNA thus encodes the stop codon to mediate co-translational ncAA incorporation.
Figure 3
Figure 3
Directed evolution of orthogonal aaRS/tRNA pairs.
Figure 4
Figure 4
Engineering aaRS properties. Schematic of the aaRS–tRNA complex, indicating regions of special interest for mutations and engineering. These include residues in the aaRS that interact with ncAAs, tRNAs (identity elements, especially those that bind anticodons), the deletion of editing domains, the deletion of N- or C-terminal parts, and replacement with parts from an aaRS of different origin to yield a chimeric aaRS.
Figure 5
Figure 5
Ribosome engineering. A mutated anti-Shine–Dalgarno sequence in ribosomal RNA of engineered ribosomes in concert with a mutated Shine–Dalgarno motif within orthogonal mRNA molecules are the basis for engineered orthogonal ribosomes that work with specific orthogonal mRNAs. These can further be combined with subunit tethering/stapling and quadruplet tRNAs to minimize or eliminate release factor competition with tRNAs.
Figure 6
Figure 6
Artificial base pairs for de novo codon formation. Examples of artificial base pairs are illustrated below the natural base pairs dA–dT and dC–dG. The chemical structures of the deoxy-(d-)nucleotides are shown in a double strand of DNA with base pairing indicated by dashed blue lines. The iso-dC–iso-dG base pair was the first artificial base pair that enabled GCE in vitro. The d5SICS–dNAM and dTPT3–dNAM base pairs form by hydrophobic interactions and have been successfully transferred into E. coli and also used for GCE. The dZ–dP and dS–dB pairs form the basis of Hachimoji DNA, which is yet to be applied for GCE.
Figure 7
Figure 7
Genome editing via MAGE and CAGE. The native E. coli genome is first divided into arbitrary regions. In each MAGE cycle, one of these regions is targeted for multiple edits with an oligo-DNA mixture, represented as colored dashes. The DNA oligos applied in a MAGE cycle induce multiple edits of the target genomic region, and we illustrate the dashes representing an oligo pool in the same color as the genomic region that underwent editing with exactly this oligo pool. For example, the dark red oligos are used in the first MAGE cycle to edit one of the genomic regions, resulting in the eventually edited dark red genomic region. MAGE cycles are performed for editing different genomic regions in parallel. Subsequently, CAGE cycles mediate the combination of edited genomic regions, finally merging the outcome of MAGE cycles into a completely edited E. coli chromosome.
Figure 8
Figure 8
Assembly of synthetic genomes with REXER. Replacement of native DNA with large, chemically synthesized DNA fragments by iterative rounds of REXER. In each round, a new piece of synthetic DNA is inserted from a BAC together with a dual antibiotic resistance cassette, replacing the previous one. The colored boxes represent antibiotic resistance cassettes. Prior to using REXER, a dual antibiotic resistance cassette is introduced to the native genome. The residing antibiotic cassette can be removed after the final REXER iteration to yield a scarless synthetic chromosome. Alternatively, they can be retained in the genome to facilitate selection in later experiments.
Figure 9
Figure 9
Transfer of GCE-enabling technologies from the prokaryotic to the eukaryotic world. Diverse GCE-enabling methods are denoted as either successfully established (green tick), partially established (light blue tick in parentheses), or not established (dark blue cross) across the model host systems of bacteria (E. coli), fungi (S. cerevisiae) and higher eukaryotes/mammalian cells (H. sapiens).
Figure 10
Figure 10
Mutual orthogonality between GCE-specific and endogenous translational machinery. In order to facilitate the incorporation of multiple distinct ncAAs, it is essential that the GCE-specific tRNAs (tRNA1 and tRNA2) only interact with their cognate synthetases and that the GCE-specific synthetases (RS1 and RS2) selectively accept different ncAAs.
Figure 11
Figure 11
A comparison of site-specific stop codon and residue-specific sense codon reassignment. In the case of site-specific GCE, an amber codon introduced at a predetermined site in an mRNA of a POI is repurposed to introduce an ncAA at the corresponding site in the POI. In contrast, for residue-specific sense codon reassignment, a sense codon (denoted as XXX) is repurposed to incorporate an ncAA and all instances of the chosen codon occurring in the host transcriptome are attempted to be reassigned.
Figure 12
Figure 12
A synthetic membrane-less organelle renders GCE mRNA-specific.
Figure 13
Figure 13
The expanding repertoire of noncanonical amino acids. Genetic encoding unleashes a multiverse of functional groups on the amino acid, paving the way for cutting-edge chemical biology innovations and various applications.
Figure 14
Figure 14
Distribution of predicted chemical properties (log S and cLogP) of the set of more than 500 ncAAs genetically encoded in vivo. The distribution illustrates the overall trend and variability of these two chemical properties along the genetically encoded ncAAs. S-methylferrocenyl-l-cysteine has been treated as outlier and excluded from the plotting. (a) Scatter plot of the predicted cLogP and log S values, highlighting with individual data points (blue dots) the ncAAs given in Table 1 and providing a detailed view of the spread and correlation between these properties (Pearson r or corrCoeff = −0.688). (b) Gaussian distribution of predicted log S values, with mean value and standard deviation indicated (mean = −0.3737, SD = 1.0981). (c) Gaussian distribution of predicted clogP values, with mean value and standard deviation indicated (mean = −1.4493, SD = 1.2297).
Figure 15
Figure 15
Genetically encoded ncAAs, bearing PTMs as naturally occurring, as masked function, or as mimics. (a) Ser/Thr phosphorylation. (b) Tyr phosphorylation. (c) Tyr sulfation.
Figure 16
Figure 16
Genetically encoded lysine derivatives for studying Lys acetylation. (a) ncAAs that have been encoded in vivo and (inside the box) on which PTM studies have been performed in living cells. (b) Deacetylase-resistant ncAAs that have been in vivo encoded.
Figure 17
Figure 17
Developments in genetically encoded amino acids for studying protein ubiquitination. (a) ncAAs that have been encoded in vivo, but for which deprotection and PTM studies have been performed only in vitro. (b) ncAAs that have been encoded in vivo, and for which deprotection and PTM studies have been performed in living cells. (c) ncAAs not requiring deprotection that have been encoded in vivo, for which PTM studies were performed in living cells.
Figure 18
Figure 18
Mimicking N-methylated lysine. Strategies for deprotecting genetically encoded N-methyl lysine derivatives.
Figure 19
Figure 19
Oxidative-PTM ncAAs, genetically encoded in both prokaryotic and eukaryotic systems.
Figure 20
Figure 20
Genetically encoded caged ncAAs. (a) Caged ncAAs genetically encoded. (b) Representative application of decaging in living cells. (c) Decaging efficiency of protecting groups, in order of increasing efficiency. (d) Depending on the benzylic substituents, deprotection of nitrobenzyl groups by light irradiation generates either an aldehyde or a ketone byproduct. (e) Genetically encoded polar and small ncAAs through post-translational photolysis.
Figure 21
Figure 21
Photoswitchable ncAAs. (a) Genetically encoded photoswitchable ncAAs. (b) Isomerization wavelengths of various azobenzene ncAAs, including heterocycle- and fluorine-containing derivatives. (c) Protein stapling and control over protein conformation by photoisomerization. (d) Control over protein translation by photoisomerization.
Figure 22
Figure 22
Photo-crosslinking ncAAs. (a) Genetically encoded ncAA photo-crosslinkers. (b) Light-dependent formation of active species and targeted residue sites. (c) Relative photo-crosslinking efficiency at > 345 nm in terms of photoactivation efficiency and half-time of active species.
Figure 23
Figure 23
Bioorthogonal reactions applied to study proteins in vivo.
Figure 24
Figure 24
ncAAs for Staudinger reactions. (a) Genetically encoded ncAAs for Staudinger reaction. (b) Mechanism of Staudinger ligation to an azido-ncAA. (c) Mechanism of Staudinger ligation to a cyclopropenone ncAA. (d) Mechanism of Staudinger reduction of aromatic azides.
Figure 25
Figure 25
Genetically encoded ncAA for CuAAC labeling in living cells.
Figure 26
Figure 26
(a) Genetically encoded azide ncAAs for SPAAC reactions and corresponding reactivity with strained alkynes. (b) Genetically encoded strained alkynes ncAAs for SPAAC reaction and the use of azides as external labeling handles.
Figure 27
Figure 27
The nitrile–aminothiol (NAT) condensation. (A) Genetic encoded ncAAs for NAT condensation. (B) Genetic encoding of aminothiols for in vitro condensation. (C) Genetic encoding of a nitrile for inter- or intramolecular condensation in living cells.
Figure 28
Figure 28
ncAAs for photoclick cycloadditions. (A) Genetically encoded alkene-bearing ncAAs for photoclick cycloadditions. (B) Mechanism of photo-activation of the tetrazole ring. (C) Reactivity of alkenes as a function of substituents and strain, in order of increasing reactivity.
Figure 29
Figure 29
ncAAs for IEEDA reactions. (a) Genetically encoded alkenes ncAAs for IEDDA. (b) Frontier molecular orbital of classical and inverse-electron-demand Diels–Alder reactions. (c) Reactivity of alkene and alkyne ncAAs with tetrazines. (d) Fast TCO decaging of a protein using a specially designed tetrazine. (e) Genetically encoded tetrazine-functionalized ncAAs. (f) Reactivity of tetrazine-functionalized ncAAs with TCO.
Figure 30
Figure 30
ncAAs for Pd-catalyzed reactions. (a) Genetically encoded ncAAs for Pd-catalyzed cross-coupling reactions. (b) Genetically encoded ncAAs for Pd-catalyzed decaging. (c) Mechanism for activating protein function (left) and catalytic cycle of decaging (right).
Figure 31
Figure 31
ncAA for crosslinking reactions. (a) Genetically encoded crosslinker ncAAs. (b) Proximity-induced crosslinking with a nucleophilic residue (inter- or intramolecular). (c) Dha/Dhb species generated in live cells by crosslinking.
Figure 32
Figure 32
ncAAs used as probes for spectroscopy. (a) Genetically encoded IR probes for mainly in vitro applications. (b) Genetically encoded NMR probes for in vivo NMR. (c) Genetically encoded EPR probe for in vivo measurements.
Figure 33
Figure 33
Genetically encoded fluorescent ncAAs with their corresponding fluorescence emission wavelengths.
Figure 34
Figure 34
Stimulus-responsive ncAAs. (a) Genetically encoded stimulus-responsive ncAAs. (b) Metal-chelating ncAAs, in vitro studied. (c) pH-sensitive fluorescence of ncAA, HOCouA, in vitro studied. (d) GFP-fluorescence quenching and activation in the presence of internal stimuli, in vivo studied.
Figure 35
Figure 35
(a) Genetically encoded α-hydroxy acids as ncAAs. (b) H-bonding perturbation by replacing the amide bond with an ester. (c) Genetically encoded β-amino acids. (d) Genetically encoded α,α-disubstituted amino acids. (e) Genetically encoded oxazole-containing ncAA for cell imaging.
Figure 36
Figure 36
(a) Bifunctional genetically encoded ncAAs for in vivo studies. (b, c) Mechanism of bait–prey protein photo-crosslinking and MS identification for DiZSeK (b) and DiZHSeC and PrDiZASec (c).

References

    1. Nirenberg M. W.; Jones O. W.; Leder P.; Clark B. F. C.; Sly W. S.; Pestka S. On the Coding of Genetic Information. Cold Spring Harb. Symp. Quant. Biol. 1963, 28, 549–557. 10.1101/SQB.1963.028.01.074. - DOI
    1. Koonin E. V.; Novozhilov A. S. Origin and Evolution of the Genetic Code: The Universal Enigma. IUBMB Life 2009, 61 (2), 99–111. 10.1002/iub.146. - DOI - PMC - PubMed
    1. Mühlhausen S.; Findeisen P.; Plessmann U.; Urlaub H.; Kollmar M. A Novel Nuclear Genetic Code Alteration in Yeasts and the Evolution of Codon Reassignment in Eukaryotes. Genome Res. 2016, 26 (7), 945–955. 10.1101/gr.200931.115. - DOI - PMC - PubMed
    1. Pánek T.; Žihala D.; Sokol M.; Derelle R.; Klimeš V.; Hradilová M.; Zadrobílková E.; Susko E.; Roger A. J.; Čepička I.; Eliáš M. Nuclear Genetic Codes with a Different Meaning of the UAG and the UAA Codon. BMC Biol. 2017, 15 (1), 8. 10.1186/s12915-017-0353-y. - DOI - PMC - PubMed
    1. Mukai T.; Lajoie M. J.; Englert M.; Söll D. Rewriting the Genetic Code. Annu. Rev. Microbiol. 2017, 71, 557–577. 10.1146/annurev-micro-090816-093247. - DOI - PMC - PubMed

LinkOut - more resources