. 2016 Oct 20;538(7625):329-335.

doi: 10.1038/nature19791. Epub 2016 Sep 14.

Accurate de novo design of hyperstable constrained peptides

Gaurav Bhardwaj^{1

2}, Vikram Khipple Mulligan^{1

2}, Christopher D Bahl^{1

2}, Jason M Gilmore^{1

2}, Peta J Harvey³, Olivier Cheneval³, Garry W Buchko⁴, Surya V S R K Pulavarti⁵, Quentin Kaas³, Alexander Eletsky⁵, Po-Ssu Huang^{1

2}, William A Johnsen⁶, Per Jr Greisen^{1

2

7}, Gabriel J Rocklin^{1

2}, Yifan Song^{1

2

8}, Thomas W Linsky^{1

2}, Andrew Watkins⁹, Stephen A Rettie², Xianzhong Xu⁵, Lauren P Carter², Richard Bonneau^{10

11}, James M Olson⁶, Evangelos Coutsias¹², Colin E Correnti⁶, Thomas Szyperski⁵, David J Craik³, David Baker^{1

2

13}

Affiliations

¹ Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA.
² Institute for Protein Design, University of Washington, Seattle, Washington 98195, USA.
³ Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland 4072, Australia.
⁴ Seattle Structural Genomics Center for Infectious Diseases, Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington 99352, USA.
⁵ Department of Chemistry, State University of New York at Buffalo, Buffalo, New York 14260, USA.
⁶ Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA.
⁷ Global Research, Novo Nordisk A/S, DK-2760 Måløv, Denmark.
⁸ Cyrus Biotechnology, Seattle, Washington 98109, USA.
⁹ Department of Chemistry, New York University, New York, New York 10003, USA.
¹⁰ Department of Biology, New York University, New York, New York 10003, USA.
¹¹ Center for Computational Biology, Simons Foundation, New York, New York 10010, USA.
¹² Applied Mathematics and Statistics and Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York 11794, USA.
¹³ Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA.

PMID: 27626386
PMCID: PMC5161715
DOI: 10.1038/nature19791

Accurate de novo design of hyperstable constrained peptides

Gaurav Bhardwaj et al. Nature. 2016.

. 2016 Oct 20;538(7625):329-335.

doi: 10.1038/nature19791. Epub 2016 Sep 14.

Authors

Affiliations

¹ Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA.
² Institute for Protein Design, University of Washington, Seattle, Washington 98195, USA.
³ Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland 4072, Australia.
⁴ Seattle Structural Genomics Center for Infectious Diseases, Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington 99352, USA.
⁵ Department of Chemistry, State University of New York at Buffalo, Buffalo, New York 14260, USA.
⁶ Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA.
⁷ Global Research, Novo Nordisk A/S, DK-2760 Måløv, Denmark.
⁸ Cyrus Biotechnology, Seattle, Washington 98109, USA.
⁹ Department of Chemistry, New York University, New York, New York 10003, USA.
¹⁰ Department of Biology, New York University, New York, New York 10003, USA.
¹¹ Center for Computational Biology, Simons Foundation, New York, New York 10010, USA.
¹² Applied Mathematics and Statistics and Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York 11794, USA.
¹³ Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA.

PMID: 27626386
PMCID: PMC5161715
DOI: 10.1038/nature19791

Abstract

Naturally occurring, pharmacologically active peptides constrained with covalent crosslinks generally have shapes that have evolved to fit precisely into binding pockets on their targets. Such peptides can have excellent pharmaceutical properties, combining the stability and tissue penetration of small-molecule drugs with the specificity of much larger protein therapeutics. The ability to design constrained peptides with precisely specified tertiary structures would enable the design of shape-complementary inhibitors of arbitrary targets. Here we describe the development of computational methods for accurate de novo design of conformationally restricted peptides, and the use of these methods to design 18-47 residue, disulfide-crosslinked peptides, a subset of which are heterochiral and/or N-C backbone-cyclized. Both genetically encodable and non-canonical peptides are exceptionally stable to thermal and chemical denaturation, and 12 experimentally determined X-ray and NMR structures are nearly identical to the computational design models. The computational design methods and stable scaffolds presented here provide the basis for development of a new generation of peptide-based drugs.

PubMed Disclaimer

Conflict of interest statement

Authors declare no competing financial interests.

Figures

**Extended Data Figure 1. Disulfide bonds are well defined by X-ray crystallography**
An *F_o* – *F_c* omit-map is shown contoured at 4 σ for design gEHEE_06. Disulfide sulfur atoms were removed, and the omit-map was calculated following real-space refinement.

**Extended Data Figure 2. Flowchart of pipelines for designing noncanonical cyclic peptides**
Inputs are shown in blue, RosettaScripts-automated parts of the pipeline are in green, parts carried out by Rosetta standalone applications are pink (the fragment picker application) and purple (the various structure prediction applications), parts performed with molecular dynamics software are yellow, and manual steps are grey. a) Fragment assembly-based design pipeline. Final computational validation was carried out using MD simulations and fragment-based Rosetta *ab initio* structure prediction. For peptides containing isolated D-amino acids, these residues were mutated to glycine for Rosetta *ab initio* structure prediction. b) Fragment-free, GenKIC-based design pipeline. This approach permits design of noncanonical topologies like the mixed H_LH_R topology, which occurs in no known natural protein. The GenKIC-based structure prediction algorithm is described in Extended Data Figure 7 and in the Supplementary Information.

**Extended Data Figure 3. Sidechain placement in noncanonical peptide designs chosen for experimental characterization**
Designs are shown as cartoon and stick representations (top row in each box) and as van der Waals spheres showing sidechain packing (bottom row in each box). L-amino acid residues are shown in cyan, and D-amino acid residues are coloured orange. Sidechains of D- or L-variants of alanine, phenylalanine, isoleucine, leucine, valine, tryptophan, and tyrosine are coloured grey to aid visualization of hydrophobic packing interactions.

**Extended Data Figure 4. Molecular dynamics screening of designed peptides**
Fifty independent molecular dynamics (MD) simulations in explicit solvent conditions, all starting from the designed peptide, were used for discriminating good, kinetically-stable (*e.g.* EHE_D1) designs from non-optimal designs of the same topology (*e.g.* EHE_X18 and EHE_X11). a) Five representative trajectories from MD simulation runs. Designs that showed good convergence, and smaller fluctuations were selected for further experimental characterization. b) RMSD distribution from all 50 trajectories. Only the last one-third of the trajectory was used for this analysis. Designs with narrower distributions were picked for further testing. c) Concatenated trajectory of all 50 independent runs show lower fluctuations for the more optimal designs.

**Extended Data Figure 5. Structural characterization of NC_EEH_D1**
NMR structure of NC_EEH_D1 does not match the designed topology. a) Rosetta-designed model for NC_EEH_D1. b) Ensemble of conformers representing the NMR solution structure. c) Superposition of the designed model (blue) with a representative NMR conformer (green).

**Extended Data Figure 6. Structural mapping of sequence-aligned region between NC_EHE_D1 and 2MA5**
Design NC_EHE_D1 and PDB entry 2MA5 show weak but significant (e-value: 2×10⁻⁴) sequence alignment, which is highlighted in purple. The aligned region folds into very different structures in the different contexts of peptide and protein.

**Extended Data Figure 7. Generalized kinematic closure (GenKIC) algorithm flowchart**
GenKIC allows sampling of closed conformations of arbitrary chains of atoms, passing through canonical or noncanonical backbone or sidechain linkages. Bond length, bond angle, and torsional degrees of freedom in the chain can be fixed, perturbed from a starting value by small amounts, set to user-defined values, or sampled randomly. The algorithm then solves for six torsion angles adjacent to three user-defined pivot atoms in order to enforce closure of the loop. The many solutions from the closure are then filtered internally, and each can be subjected to arbitrary user-defined Rosetta protocols and filtration in order to prune the solution list further. A single solution is selected from those passing filters by a user-defined selection criterion. This flowchart shows the steps in a single invocation of the algorithm; for sampling, a user may specify that the algorithm be applied any number of times. User inputs are shown in blue, steps carried out by the GenKIC algorithm itself are in green, steps carried out by Rosetta code external to the GenKIC algorithm are shown in yellow, and outputs are shown in salmon.

**Extended Data Figure 8. A new fragment-free structure prediction algorithm**
a) Flowchart diagramming the steps to generate a single sampled conformation. In typical usage, this process would be repeated tens of thousands of times to produce many samples. Inputs (the peptide sequence and an optional PDB file for the design structure) are shown in blue, and outputs (the sampled structure, its energy, and its RMSD from the design structure) are shown in salmon. Steps performed by the Generalized Kinematic Closure algorithm are shaded green, and setup and completion steps performed by the **simple_cycpep_predict** application are shown in yellow. Further details of this algorithm are discussed in the Supplementary Information available online. b) The initial, random peptide conformation with bad terminal peptide bond geometry. c) Ensemble of closed conformations found for a single closure attempt. In this example, residue 7 (cyan) is the fixed anchor residue. Certain regions of the peptide have been set to left- or right-handed helical conformations prior to solving closure equations. d) A single closed solution with relative cysteine sidechain orientations that pass the initial, low-stringency filter for disulfide (*fa_dslf*) conformational energy. e) The resulting structure, following sidechain repacking, energy-minimization, and cyclic de-permutation.

**Extended Data Figure 9. Mutational tolerance of selected genetically-encodable designs**
RP-HPLC traces for the parental designs are shown next to the redesigned variants where applicable. Proteins run under oxidized conditions are shown in black while proteins run following reduction with 10mM DTT are shown in red. Insets within each panel are shown only to highlight the SDS-PAGE mobility of each purified protein under oxidizing (left band) and reducing conditions (right band). Sequence alignments are shown with the mutated positions highlight in red, along with theoretical isoelectric points as calculated by ProtParam.

**Extended Data Figure 10. Mutational tolerance of selected NC designs**
a–b) Mutational tolerance of D-proline, L-proline loop of design NC_cEE_D1 (green in panel a), assessed by secondary ¹H_α chemical shift for the design sequence (black bars in panel b) and the p18d loop mutation (red bars). Eliminating this key proline residue does not result in loss of β-strand signal. c–d) Mutational tolerance of loop region of design NC_HEE_D1 (green in panel c), as assessed by CD spectroscopy for the design sequence (left plot, panel d) and for the D19T, p20q, P21D triple mutant (right plot, panel d). Both proline residues may be mutated without loss of secondary structure or major change in the thermal stability. e–g) Computationally predicted mutational tolerance of design NC_H_LH_R_D1, across the entire sequence. Each position was successively mutated *in silico* to D- or L-alanine, arginine, aspartate, phenylalanine, or valine (preserving the position’s chirality), and full folding simulations were carried out with the Rosetta simple_cycpep_predict application. Folding funnel quality was evaluated using the P_near metric described in the **Methods**. e) Representative plots of energy *vs.* RMSD from the design structure, plotted for the design sequence (top), for the non-disruptive R14F mutation (middle), and for the e18v mutation (bottom). Results from GenKIC-based structure prediction runs are shown in blue, and relaxation runs, in orange. Note that the bottom case shows many sampled states far from the design state with energy equal to or less than the design state energy. f) Mutational tolerance by position (vertical axis) and mutation (horizontal axis). Blue rectangles represent well-tolerated mutations, and red to black rectangles represent disruptive mutations, based on P_near evaluation of the folding funnel. Black borders indicate the design sequence. g) Mutational tolerance mapped onto the NC_H_LH_R_D1 structure, with colours as in the previous panel. Most positions tolerate mutation well, with only the disulfide bridge (C8-c21) and the salt bridges formed by e18 being highly sensitive. The hydrogen bond networks formed by residues Q5, e24, and s25 show some moderate sensitivity to mutation, as do residues E3 and e16.

**Figure 1. Designed peptide topologies**
The designed secondary structure architectures for each of the three classes of constrained peptides (genetically-encodable disulfide-rich, heterochiral disulfide-crosslinked, and cyclic) span most of the topologies that can be formed with four or fewer secondary structure elements. Arrows: β-strands, orange cylinders: right-handed α-helices, green cylinder: left-handed α-helix; red: loop segments containing D-amino acid residues.

**Figure 2. Computational design and biophysical characterization of genetically-encodable disulfide-rich peptides**
Genetically-encodable peptides are given the prefix “g” and a number to differentiate designs that share a common topology. (column a) Cartoon renderings of each design are shown with rainbow colouring from the N-terminus (blue) to the C-terminus (red), and disulfide bonds are shown as sticks. (column b) The energy landscape of each designed sequence was assessed by Rosetta structure prediction calculations starting from an extended chain (blue dots) or from the design model (orange dots); lower energy structures were sometimes sampled in the former because disulfide constraints were only present in the latter. (column c) CD spectra at 20°C (blue line), after heating to 95°C (red line), and upon cooling back to 20°C (green line). Spectra collected with 2.5 mM TCEP are shown in purple. (column d) CD steady-state wavelength spectra as a function of GdnHCl concentration.

**Figure 3. X-ray crystal structures and NMR solution structures of designed peptides are very close to design models**
Structures for gEHE_06, gEEH_04, gEEHE_02, and gHHH_06 were determined by NMR spectroscopy, and the structure of gEHEE_06 was determined by X-ray crystallography. (column a) C_α traces of NMR ensembles, or superimposed members of the asymmetric unit, (grey) are aligned against the design model (rainbow). Disulfide bonds are shown with sidechain atoms rendered as sticks with sulfur atoms coloured yellow. (column b) A cartoon representation of the lowest energy conformer of each NMR ensemble or crystallographic asymmetric unit (grey) is shown aligned to the design model (rainbow). Sidechain atoms of hydrophobic core residues are rendered as sticks.

**Figure 4. Design and characterization of heterochiral disulfide-constrained peptides**
The prefix “NC” denotes noncanonical sequence or backbone architecture, and a numerical suffix differentiates designs sharing a common topology. (*Column a*) Cartoon representations of design models with the N-terminus in blue and C-terminus in red. (*Column b)* Folding energy landscapes from Rosetta *ab initio* structure prediction calculations. Blue dots indicate lowest-energy structures identified in independent Monte Carlo trajectories. Orange dots are from trajectories starting with the design model. (r.e.u: Rosetta Energy Units, RMSD: root mean square deviation from the designed topology). (*Column c)* Five representative trajectories from a total of 50 independent molecular dynamics simulations starting from the design model with different initial velocities. (*Column d)* NMR-determined structure ensembles. Cartoon representations coloured and oriented as in column a. (*Column e*) Superposition of the designed structure (blue) with the lowest-energy NMR structure (green). (*Column f)* CD wavelength spectra between 195 nm and 260 nm recorded at 25 °C (black), 55 °C (blue), 95 °C (red), and after cooling back to 25 °C (green). (*Column g)* CD spectra recorded at 0 M (black), 2 M (blue), 4 M (green), or 6 M GdnHCl (red), or with 2.5 mM TCEP/0 M GdnHCl (purple). Data are truncated in the far-UV region for spectra acquired in the presence of high GdnHCl concentrations (due to GdnHCl absorbance).

**Figure 5. Design and characterization of N-C backbone cyclic peptides**
Columns are as indicated in Figure 4 legend. A lowercase “c” in the peptide name indicates N-C cyclic backbone.

**Figure 6. Design and characterization of a peptide with noncanonical secondary and tertiary structure**
a) NC_H_LH_R_D1 design (cyan: L-amino acids, orange: D-amino acids) b) Folding energy landscape generated using a new structure prediction algorithm compatible with noncanonical secondary structures (see **Methods** and Supplementary Information). c) Five representative molecular dynamics trajectories (from a total of 50) starting from the design model with different initial velocities. d) NMR-determined structure ensembles, coloured and oriented as in first panel. e) Superposition of designed structure (blue) with lowest-energy NMR structure (green). f) CD spectra between 195 nm and 260 nm recorded at 25 °C (black), 55 °C (blue), 95 °C (red), and after cooling back to 25 °C (green). The CD spectrum of NC_H_LH_R_D1 exhibits very weak signals because the L- and D- helical signals largely cancel. g) Secondary ¹H_α chemical shifts (ppm) show no change from 25 °C (black) to 75 °C (red).

See this image and copyright information in PMC

References

1. Conibear AC, et al. Approaches to the stabilization of bioactive epitopes by grafting and peptide cyclization. Biopolymers. 2016;106:89–100. - PubMed
1. Craik DJ, Fairlie DP, Liras S, Price D. The future of peptide-based drugs. Chem Biol Drug Des. 2013;81:136–147. - PubMed
1. Góngora-Benítez M, Tulla-Puche J, Albericio F. Multifaceted roles of disulfide bonds. Peptides as therapeutics. Chem Rev. 2014;114:901–926. - PubMed
1. Kimura RH, Levin AM, Cochran FV, Cochran JR. Engineered cystine knot peptides that bind alphavbeta3, alphavbeta5, and alpha5beta1 integrins with low-nanomolar affinity. Proteins. 2009;77:359–369. - PMC - PubMed
1. Boyken SE, et al. De novo design of protein homo-oligomers with modular hydrogen-bond network-mediated specificity. Science. 2016;352:680–687. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Research Materials
- Addgene Non-profit plasmid repository

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Accurate de novo design of hyperstable constrained peptides

Affiliations

Accurate de novo design of hyperstable constrained peptides

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials