Review

. 2021 Jan-Jun:296:100558.

doi: 10.1016/j.jbc.2021.100558. Epub 2021 Mar 18.

Recent advances in de novo protein design: Principles, methods, and applications

Xingjie Pan¹, Tanja Kortemme²

Affiliations

¹ Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA; UC Berkeley - UCSF Graduate Program in Bioengineering, University of California San Francisco, San Francisco, California, USA. Electronic address: xingjiepan@gmail.com.
² Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA; UC Berkeley - UCSF Graduate Program in Bioengineering, University of California San Francisco, San Francisco, California, USA; Quantitative Biosciences Institute (QBI), University of California San Francisco, San Francisco, California, USA. Electronic address: tanjakortemme@gmail.com.

PMID: 33744284
PMCID: PMC8065224
DOI: 10.1016/j.jbc.2021.100558

Review

Recent advances in de novo protein design: Principles, methods, and applications

Xingjie Pan et al. J Biol Chem. 2021 Jan-Jun.

. 2021 Jan-Jun:296:100558.

doi: 10.1016/j.jbc.2021.100558. Epub 2021 Mar 18.

Authors

Xingjie Pan¹, Tanja Kortemme²

Affiliations

¹ Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA; UC Berkeley - UCSF Graduate Program in Bioengineering, University of California San Francisco, San Francisco, California, USA. Electronic address: xingjiepan@gmail.com.
² Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA; UC Berkeley - UCSF Graduate Program in Bioengineering, University of California San Francisco, San Francisco, California, USA; Quantitative Biosciences Institute (QBI), University of California San Francisco, San Francisco, California, USA. Electronic address: tanjakortemme@gmail.com.

PMID: 33744284
PMCID: PMC8065224
DOI: 10.1016/j.jbc.2021.100558

Abstract

The computational de novo protein design is increasingly applied to address a number of key challenges in biomedicine and biological engineering. Successes in expanding applications are driven by advances in design principles and methods over several decades. Here, we review recent innovations in major aspects of the de novo protein design and include how these advances were informed by principles of protein architecture and interactions derived from the wealth of structures in the Protein Data Bank. We describe developments in de novo generation of designable backbone structures, optimization of sequences, design scoring functions, and the design of the function. The advances not only highlight design goals reachable now but also point to the challenges and opportunities for the future of the field.

Keywords: PDB; Rosetta; biophysics; computational protein design; de novo protein design; protein structure.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest The authors declare that they have no conflicts of interest with the contents of this article.

Figures

**Figure 1**
**Major aspects of the *de novo* protein design.** The design of a functional *de novo* protein, for example, a binder (*middle*, *magenta*) to a target protein (*middle*, *gray*), requires sampling of the backbone structure space to find a backbone compatible with the function, sequence optimization to stabilize the backbone, and designing the functional site interactions. A scoring function is necessary to select designs with desired properties, typically by identifying low-energy sequence–structure combinations.

**Figure 2**
**Advances in *de novo* backbone generations**. A, methods to build *de novo* proteins by assembling local structures. The blueprint method assembles fragments of three or nine residues into idealized structures with different fold topologies (29, 30, 31, 33, 61, 62, 64). Modular leucine-rich motifs are connected into repeat proteins with defined curvatures (65). The SEWING method (36) connects local structural elements into helical proteins with novel folds. Overlapping regions are colored. B, the Foldit game (71) and TopoBuilder (72) let players or experts rationally design the atomic details of backbone structures. C, symmetry reduces the complexity of backbone generation. Symmetry was used to design a 4-fold (*colors*) symmetric TIM barrel (34) and repeat proteins (67). D, *de novo* protein fold families can be generated by sampling the geometries (length, as well as relative position and orientation) of secondary structure elements (28, 32). E, generative machine learning methods (*red*) build novel backbone structures by latent space sampling (81). The hallucination method (45) (*red*) uses the TR-Rosetta neural network to predict the structure distribution of a sequence. The sequence is optimized using Monte Carlo–simulated annealing by maximizing the divergence between the predicted structure distribution and a background distribution representing unstructured proteins. SEWING, structure extension with native-substructure graphs; TR, transform-restrained.

**Figure 3**
**Advances in side-chain design.**A, in layer design, polar residues (*cyan*) are only allowed at surface and boundary positions, while hydrophobic residues (*yellow*) are only allowed at boundary and core positions. B, structures generated by side chain design methods can be evaluated by a set of filters, such as core packing quality, hydrogen bond satisfaction and local sequence/structure compatibility. C, side chain design methods that exploit backbone flexibility outperform fixed backbone methods (98). D, the HBNet method (100) designs hydrogen bond networks. E, neural networks can predict the probabilities of sequences given a backbone structure (102, 103) (*red*). Generative machine learning models design sequences by latent space sampling (104, 105, 106, 107, 108) (*green*). The TR-Rosetta neural network predicts the probability of the structure of a given sequence. The difference between the desired structure and the predicted structure can be backpropagated through the neural network to optimize the sequence (109) (*blue*). TR-Rosetta, transform-restrained Rosetta.

**Figure 4**
**Advances in scoring functions**. A, a membrane scoring function (124) uses a continuous hydration fraction to calculate the free energy change of residues from water to the lipid environment. Water pores in membrane proteins are explicitly modeled. B, protein design scoring functions are generalized to model small molecules (132) and carbohydrates (131). C, the TERMs-based scoring function (133) breaks proteins into tertiary structure motifs and evaluates the fitness of the sequence for any local structure using the sequence profiles of the tertiary motifs. D, machine learning methods predict the probability of sequences given a structure (102) or the probability of structures given a sequence (109). The predicted probabilities can be used as scores for the compatibility between sequences and structures. TERMs, tertiary structural motifs.

**Figure 5**
**Advances in design of new protein functions.**A, a apixaban (*yellow*) binder designed by the Convergent Motifs for Binding Sites (COMBS) algorithm (25). B, A *de novo* protein (*green*) binds the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike protein (*gray*) (4). C, *de novo* proteins self-assemble into heterodimers (120), two-dimensional materials (9), filaments (8), cages (140), and alpha amyloids (143). D, a *de novo*–designed multipass transmembrane protein that has a defined membrane orientation (148). E. the designed DANCER protein has a tryptophan side chain that switches between predicted conformational states on the millisecond timescale (152).

**Figure 6**
**Advances in the design of protein switches that change conformation in response to diverse signals.**A, a designed helical trimer changes its oligomerization state in response to pH changes (155). B, a designed helical bundle protein changes conformation upon binding to a calcium ion (*green*) and a chloride ion (*blue*) (156). C, a designed artificial chemically induced dimerization system (12) assembles upon binding to a farnesyl pyrophosphate ligand (*spheres*), linking ligand binding (sensing) to a modular response through reconstitution of a split output module (*gray*, *magenta*). D, in the LOCKR system, a helical peptide “key” (*magenta*) can displace and expose a signal peptide (*green*) (15). LOCKR, latching orthogonal cage-key proteins.

**Figure 7**
**Success rates reported for design studies listed in**Table 1. The success rate is defined as the percentage of reported designs in each study that adopt the designed structure (*folded*, *blue*; experimental structure determined, *orange*) or function (*green*, *red*). The *circle* size denotes the number of folded/functional designs in each study. The success rates for studies where proteins were *de novo*–designed to have new structures are varied but can be high with many designs (*blue*). In contrast, success rates and numbers of successful designs for proteins with new functions (*green*) are much lower, except in a few cases where functional designs were all-helical proteins (*red*). Only studies that reported ten or more experimentally characterized designs (Table 1) are included. “Folded” refers to designs that were characterized by CD and/or NMR spectroscopy or had an experimentally determined structure, displayed the expected oligomerization state (if measured), and/or were functional (if designed to have a function).

See this image and copyright information in PMC

References

1. Huang P.S., Boyken S.E., Baker D. The coming of age of de novo protein design. Nature. 2016;537:320–327. - PubMed
1. Kuhlman B., Bradley P. Advances in protein structure prediction and design. Nat. Rev. Mol. Cell Biol. 2019;20:681–697. - PMC - PubMed
1. Chevalier A., Silva D.A., Rocklin G.J., Hicks D.R., Vergara R., Murapa P., Bernard S.M., Zhang L., Lam K.H., Yao G., Bahl C.D., Miyashita S.I., Goreshnik I., Fuller J.T., Koday M.T. Massively parallel de novo protein design for targeted therapeutics. Nature. 2017;550:74–79. - PMC - PubMed
1. Cao L., Goreshnik I., Coventry B., Case J.B., Miller L., Kozodoy L., Chen R.E., Carter L., Walls A.C., Park Y.J., Strauch E.M., Stewart L., Diamond M.S., Veesler D., Baker D. De novo design of picomolar SARS-CoV-2 miniprotein inhibitors. Science. 2020;370:426–431. - PMC - PubMed
1. Mohan K., Ueda G., Kim A.R., Jude K.M., Fallas J.A., Guo Y., Hafer M., Miao Y., Saxton R.A., Piehler J., Sankaran V.G., Baker D., Garcia K.C. Topological control of cytokine receptor signaling induces differential effects in hematopoiesis. Science. 2019;364 - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions

Substances

Actions

Grants and funding

R01 GM110089/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Recent advances in de novo protein design: Principles, methods, and applications

Affiliations

Recent advances in de novo protein design: Principles, methods, and applications

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources