Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Mar 25;120(6):3051-3126.
doi: 10.1021/acs.chemrev.9b00450. Epub 2019 Nov 27.

Chemoenzymatic Semisynthesis of Proteins

Affiliations
Review

Chemoenzymatic Semisynthesis of Proteins

Robert E Thompson et al. Chem Rev. .

Abstract

Protein semisynthesis-defined herein as the assembly of a protein from a combination of synthetic and recombinant fragments-is a burgeoning field of chemical biology that has impacted many areas in the life sciences. In this review, we provide a comprehensive survey of this area. We begin by discussing the various chemical and enzymatic methods now available for the manufacture of custom proteins containing noncoded elements. This section begins with a discussion of methods that are more chemical in origin and ends with those that employ biocatalysts. We also illustrate the commonalities that exist between these seemingly disparate methods and show how this is allowing for the development of integrated chemoenzymatic methods. This methodology discussion provides the technical foundation for the second part of the review where we cover the great many biological problems that have now been addressed using these tools. Finally, we end the piece with a short discussion on the frontiers of the field and the opportunities available for the future.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Pioneering examples of peptide bond construction through a) condensation between activated acyl donors and amines, b) Solid-Phase Peptide Synthesis (SPPS), and c) Native Chemical Ligation (NCL – see Figure 3 for additional mechanistic detail).
Figure 2.
Figure 2.
Timeline of major developments in contemporary protein semisynthesis.
Figure 3.
Figure 3.
The mechanism of native chemical ligation (NCL).
Figure 4.
Figure 4.
Synthesis of N-terminal cysteinyl peptides using either Boc or Fmoc-strategy solid-phase peptide synthesis. (PG = protecting group).
Figure 5.
Figure 5.
Common strategies for the synthesis of peptide thioesters by a) Boc and b-d) Fmoc strategy SPPS.
Figure 6.
Figure 6.
Production of recombinant N-terminal cysteinyl proteins through in vivo proteolysis.
Figure 7.
Figure 7.
Mechanism for the chemical cleavage of Met-Cys bonds using cyanogen bromide.
Figure 8.
Figure 8.
Production of recombinant N-terminal cysteinyl proteins through proteolysis of fusion proteins in vitro. a) Commonly used proteases for the generation of N-terminal cysteinyl proteins for NCL. b) Preparation of recombinant cFos leucine zipper domain with an N-terminal cysteine for NCL (Erlanson et al.). c) Preparation of recombinant N-terminal cysteinyl truncated H3 for the semisynthesis of H3K9me3 (Nguyen et al.). Inset: Structure of the Smt3/Ulp1 complex (PDB ID code 1EUV).
Figure 9.
Figure 9.
Post-translational modification of proteins through intein-mediated protein splicing.
Figure 10.
Figure 10.
Biochemical mechanism of protein splicing and associated N- and C-terminal cleavage pathways.
Figure 11.
Figure 11.
Expressed protein ligation (EPL) for the semisynthesis of modified Csk.
Figure 12.
Figure 12.
Extending ligation sites beyond Cys through a) Ser/Thr ligation, b) NCL followed by Cys alkylation, c) NCL followed by desulfurization (not limited to examples of β-thiol amino acids shown – we direct the reader to the following reviews, ), d) traceless-NCL using removable ligation auxiliaries, e) NCL at selenocysteine followed by selective deselenization (not limited to examples shown – we direct the reader to the following reviews, ).
Figure 13.
Figure 13.
Protein trans-splicing (PTS) mediated by split inteins.
Figure 14.
Figure 14.
Intein domain architecture. a) Conserved sequence motifs in inteins relative to intervening homing endonuclease domain. b) Structure of the Sce Vma intein (PDB ID: 1VDE).
Figure 15.
Figure 15.
Strategies for converting contiguous inteins into trans-splicing split inteins through a) refolding, b) induced-proximity, and c) reconstitution under native conditions.
Figure 16.
Figure 16.
Common sites for splitting inteins. a) Commonly explored sites for splitting the intein domain in the IntN (S1, S2, S3) and IntC (S10, S11) domains as well as the location of the endonuclease domain (EN/S0). β-strands (β1-β12) are indicated by grey arrows. Split site numbered according to Sun et al. b) Split sites mapped onto the structure of the Ssp DnaB mini-intein (PBD ID: 1MI8). The canonical IntN and IntC subdomains (i.e. the S0 split site) are colored light brown and orange respectively.
Figure 17.
Figure 17.
A naturally split intein from Synechocystis sp. (Ssp DnaE). a) Schematic for splicing between the naturally split Ssp DnaE intein. b) Structure of the reconstituted Ssp DnaE split intein domain. The SspN and SspC subdomains are colored red and blue respectively (PDB ID: 1ZDE).
Figure 18.
Figure 18.
Consensus designed DnaE split intein (Cfa DnaE). a) Sequence alignment of Npu DnaE and Cfa DnaE split intein. Differences are shown in grey. b) Sequence differences (blue sticks) mapped onto the structure of Npu (PDB ID: 4KL5).
Figure 19.
Figure 19.
Transpeptidation by sortase A. a) Catalytic cycle of SrtA-mediated transpeptidation. b) Close up of the SrtA-substrate complex (PDB ID: 2KID). Substrate is depicted in magenta.
Figure 20.
Figure 20.
Sortase-mediated ligation (sortagging) for the a) C-terminal or b) N-terminal modification of proteins of interest (POI).
Figure 21.
Figure 21.
Strategies for improving the yield of sortagging reactions.
Figure 22.
Figure 22.
Evolving the transpeptidase activity of sortase A. a) Schematic for the evolution of sortase activity using yeast display. b) Locations of activating mutations mapped onto the structure of WT sortase A (PDB ID: 2KID). LPAT substrate shown in purple.
Figure 23.
Figure 23.
Sortase-mediated ubiquitination.
Figure 24.
Figure 24.
Transpeptidation by butelase 1. a) Catalytic cycle of butelase 1-mediated transpeptidation. b) Close up of the butelase 1 active site complex (PDB ID: 6DHI).
Figure 25.
Figure 25.
Butelase 1-mediated labeling of ubiquitin using a thiodepsipeptide substrate.
Figure 26.
Figure 26.
Peptide ligation using subtilisin variants. a) Synthesis of short peptides using thiolsubtilisin. b) Total synthesis of ribonuclease A containing unnatural catalytic residues using subtiligase.
Figure 27.
Figure 27.
Comprehensive characterization of subtiligase specificity by mass spectrometry.
Figure 28.
Figure 28.
Trypsiligase catalyzed modification of proteins. a) C-terminal transpeptidation using trypsiligase. B) N-terminal modification of proteins using 4-guanidinophenyl esters.
Figure 29.
Figure 29.
Transpeptidase-mediated assembly of protein thioesters in multistep ligations. a) Sortase-mediated installation of an α-thioester into a recombinant protein for use in NCL. LFN, lethal factor N-terminal domain from anthrax toxin; DTA the diptheria toxin α-chain. b) Dual labelling of ubiquitin through EPL of a butelase-assembled protein a-thioester with a synthetic biotinylated peptide, followed by N-terminal sortase-mediated ligation with a fluorescein labeled peptide.
Figure 30.
Figure 30.
Streamlined EPL using split inteins.
Figure 31.
Figure 31.
Enzyme-catalyzed expressed protein ligation. Scheme showing intein-mediated generation of a recombinant protein α-thioester, followed by EPL with a cysteinyl peptide (upper) or subtiligase-catalyzed EPL with a non-cysteinyl peptide (lower).
Figure 32.
Figure 32.
Generation of a C-terminally modified semisynthetic protein using transpeptidase-assisted intein ligation (TAIL).
Figure 33.
Figure 33.
Histone modifications. a) Structure of a mononucleosome (PDB ID: 1KX5). b) Primary structure of histone tails with important modifications annotated. Ac, acetyl; Cit, citrullyl; Me, methyl; Ph, phosphoryl; Ub, ubiquityl.
Figure 34.
Figure 34.
Semisynthesis applied to histone H3. a) Semisynthesis of H3Ser10ph through NCL, and incorporation into nucleosome arrays (Shogren-Knaak et al.). b) Traceless semisynthesis of polyacetylated H3 using NCL and desulfurization (He et al.). c) Traceless semisynthesis of H3 by sortase-mediated ligation using the F40 sortase.
Figure 35.
Figure 35.
Semisynthesis applied to ubiquitinated histone H2B. a) Scheme for the semisynthesis of ubiquininated histone H2B from three fragments. b) PTM crosstalks biochemically verified using semisynthetic ubiquitinated histone H2B.
Figure 36.
Figure 36.
Semisynthesis of ubiquitinated and phosphorylated histone H2A.X.
Figure 37.
Figure 37.
Semisynthesis of serotonylated histone H3 using NCL followed by cysteine alkylation.
Figure 38.
Figure 38.
Strategies for the synthesis of asymmetrically modified nucleosomes. a) Reconstitution of asymmetric nucleosomes containing bivalent H3K4me3 and H3K27me3 modifications using traceless disulfide-tethering of semisynthetic H3 proteins., b) Stepwise incorporation of histone tetramers and dimers onto the Widom 601 nucleosome positioning DNA sequence.
Figure 39.
Figure 39.
DNA-barcoded nucleosome library for high-throughput chromatin biochemistry. Next-generation DNA sequencing allows for experiment multiplexing and quantitative read-out of individual nucleosome library members.
Figure 40.
Figure 40.
Semisynthesis of ubiquitinated histone H2B in isolated nuclei.
Figure 41.
Figure 41.
Semisyntheses of modified forms α-synuclein, including a) C-terminal, b) N-terminal, and c) central modifications.
Figure 42.
Figure 42.
Semisyntheses of modified versions of the tau protein.
Figure 43.
Figure 43.
Semisynthesis of the prion protein (PrP). a) Strategy for semisynthesis of C-terminally palmitoylated PrP. b) Structure of semisynthetic PrP bearing a native GPI anchor. c) Semisynthesis of PrP bearing glycan mimics at positions 181 and 197.
Figure 44.
Figure 44.
Semisynthesis of phosphorylated huntingtin protein (Htt).
Figure 45.
Figure 45.
Semisynthesis applied to the KcsA K+ channel. a) Scheme for the semisynthesis of unmodified KcsA using EPL. Two opposite subunits of the refolded tetrameric K+ channel are shown. b) Structure of the selectivity filter of a recombinant KcsA K+ channel (PDB ID code 1K4C). c) Structure of the selectivity filter of a semisynthetic KcsA K+ channel containing a D-Ala mutation (PDB ID code 2IH1). c) Structure of the selectivity filter of a semisynthetic KcsA K+ channel a backbone amide to ester mutation (PDB ID code 2H8P). Arrows indicate the unnatural ester linkage.
Figure 46.
Figure 46.
Semisynthesis of the KvAP voltage-dependent K+ channel using EPL and NCL.
Figure 47.
Figure 47.
Semisynthesis of the β2-adrenergic receptor (β2AR). a) Semisynthesis of C-terminally phosphorylated β2AR using sortase-mediated ligation followed by reconstitution into lipid nanodiscs. b) Preparation of segmentally labeled β2AR using PTS. c) Model of conformational change of the β2AR C-terminus following agonist bind and phosphorylation, triggering modulation of transmembrane domain structure and arrestin binding.
Figure 48.
Figure 48.
Semisynthesis of chemokines. a) General strategy for the semisynthesis of modified CXCL8 and CXCL12, and examples of modifications incorporated. b) Semisynthesis of CXCL8 with a β-peptide sequence at the C-terminus.
Figure 49.
Figure 49.
Semisynthesis of glycosylated interleukin 6.
Figure 50.
Figure 50.
Semisynthesis of glycosylated interleukin 13.
Figure 51.
Figure 51.
Semisynthesis of PSD-95 PDZ domains. a) Semisyntheses of phosphorylated PDZ1–3. b) Semisynthesis of PSD-95 PDZ2 with amide-to-ester mutants at the substrate binding site. Right: Important H-bonding interactions in the PDZ-substrate complex backbone, mapped onto a canonical PDZ-substrate structure (PDB ID: 1BE9).
Figure 52.
Figure 52.
Semisynthesis of prenylated Rab GTPases. a) Scheme for the semisynthesis of prenylated Rab7 GTPase. b) Co-crystal structure of the semisynthetic doubly prenylated Rab GTPase Ypt1 (green cartoon) and GDP dissociation inhibitor (GDI, white surface). Unstructured C-terminus is shown as dashed green line. Expansion shows a zoomed view of the two geranylgeranyl moieties. (PDB ID: 2BCG)
Figure 53.
Figure 53.
Control of Smad2 activity using semisynthesis. a) Association of semisynthetic phosphorylated Smad2 into trimers (PDB ID code 1KHX). b) Photocaging of Smad2 association and activation.
Figure 54.
Figure 54.
Semisynthesis of Ribonuclease A (RNase A). a) Scheme for the semisyntheses of RNase A using EPL at Cys or Sec., b) Dipeptidyl turn mimics installed in RNase A using EPL. Melting temperatures from themally induced unfolding experiments are shown.
Figure 55.
Figure 55.
Semisynthesis of Akt1 kinase. a) Scheme for the semisynthesis of phosphorylated Akt1 kinase using EPL. b) Model for the modulation of Akt1 activity through phosphorylation.
Figure 56.
Figure 56.
Semisyntheses of SHP-2 phosphatase. a) Scheme for the semisynthesis of SHP-2 using EPL., b) Model for the activation of SHP-2 phosphatase activity through tyrosine phosphorylation.
Figure 57.
Figure 57.
Semisyntheses of PTEN. a) Scheme for the semisynthesis of phosphorylated PTEN using subtiligase-catalyzed EPL. b) Model for the modulation of PTEN structure through phosphorylation.
Figure 58.
Figure 58.
Circular permutation of the p300 HAT domain (PDB ID: 3BIY), which was accessed semisynthetically in order to install acetylated lysine residues.
Figure 59.
Figure 59.
Semisynthesis of the catalytic domain of sortase A (SrtA). a) Scheme for the semisynthesis of citrullinated SrtA using EPL. b) Molecular mechanism of transpeptidation facilitated by H-bonding between the sorting motif and Arg/Cit.
Figure 60.
Figure 60.
Semisynthesis of inteins. a) Scheme for the semisynthesis of the Mxe GyrA branched intermediate using EPL. b) Close-up of the active site of the Mxe GyrA (extein residues are colored green and the intein is colored pink). Apparent H-bonds are indicated with dashed lines. c) Folding pathway for the Npu DnaE split intein.

References

    1. International Human Genome Sequencing, C., Finishing the Euchromatic Sequence of the Human Genome. Nature 2004, 431, 931–945. - PubMed
    1. Aebersold R; Agar JN; Amster IJ; Baker MS; Bertozzi CR; Boja ES; Costello CE; Cravatt BF; Fenselau C; Garcia BA, et al., How Many Human Proteoforms Are There? Nat. Chem. Biol 2018, 14, 206–214. - PMC - PubMed
    1. Boutureira O; Bernardes GJL, Advances in Chemical Protein Modification. Chem. Rev 2015, 115, 2174–2195. - PubMed
    1. Lang K; Chin JW, Cellular Incorporation of Unnatural Amino Acids and Bioorthogonal Labeling of Proteins. Chem. Rev 2014, 114, 4764–4806. - PubMed
    1. Kent SBH, Total Chemical Synthesis of Proteins. Chem. Soc. Rev 2009, 38, 338–351. - PubMed

Publication types