Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Jan-Jun:296:100558.
doi: 10.1016/j.jbc.2021.100558. Epub 2021 Mar 18.

Recent advances in de novo protein design: Principles, methods, and applications

Affiliations
Review

Recent advances in de novo protein design: Principles, methods, and applications

Xingjie Pan et al. J Biol Chem. 2021 Jan-Jun.

Abstract

The computational de novo protein design is increasingly applied to address a number of key challenges in biomedicine and biological engineering. Successes in expanding applications are driven by advances in design principles and methods over several decades. Here, we review recent innovations in major aspects of the de novo protein design and include how these advances were informed by principles of protein architecture and interactions derived from the wealth of structures in the Protein Data Bank. We describe developments in de novo generation of designable backbone structures, optimization of sequences, design scoring functions, and the design of the function. The advances not only highlight design goals reachable now but also point to the challenges and opportunities for the future of the field.

Keywords: PDB; Rosetta; biophysics; computational protein design; de novo protein design; protein structure.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest The authors declare that they have no conflicts of interest with the contents of this article.

Figures

Figure 1
Figure 1
Major aspects of the de novo protein design. The design of a functional de novo protein, for example, a binder (middle, magenta) to a target protein (middle, gray), requires sampling of the backbone structure space to find a backbone compatible with the function, sequence optimization to stabilize the backbone, and designing the functional site interactions. A scoring function is necessary to select designs with desired properties, typically by identifying low-energy sequence–structure combinations.
Figure 2
Figure 2
Advances in de novo backbone generations. A, methods to build de novo proteins by assembling local structures. The blueprint method assembles fragments of three or nine residues into idealized structures with different fold topologies (29, 30, 31, 33, 61, 62, 64). Modular leucine-rich motifs are connected into repeat proteins with defined curvatures (65). The SEWING method (36) connects local structural elements into helical proteins with novel folds. Overlapping regions are colored. B, the Foldit game (71) and TopoBuilder (72) let players or experts rationally design the atomic details of backbone structures. C, symmetry reduces the complexity of backbone generation. Symmetry was used to design a 4-fold (colors) symmetric TIM barrel (34) and repeat proteins (67). D, de novo protein fold families can be generated by sampling the geometries (length, as well as relative position and orientation) of secondary structure elements (28, 32). E, generative machine learning methods (red) build novel backbone structures by latent space sampling (81). The hallucination method (45) (red) uses the TR-Rosetta neural network to predict the structure distribution of a sequence. The sequence is optimized using Monte Carlo–simulated annealing by maximizing the divergence between the predicted structure distribution and a background distribution representing unstructured proteins. SEWING, structure extension with native-substructure graphs; TR, transform-restrained.
Figure 3
Figure 3
Advances in side-chain design.A, in layer design, polar residues (cyan) are only allowed at surface and boundary positions, while hydrophobic residues (yellow) are only allowed at boundary and core positions. B, structures generated by side chain design methods can be evaluated by a set of filters, such as core packing quality, hydrogen bond satisfaction and local sequence/structure compatibility. C, side chain design methods that exploit backbone flexibility outperform fixed backbone methods (98). D, the HBNet method (100) designs hydrogen bond networks. E, neural networks can predict the probabilities of sequences given a backbone structure (102, 103) (red). Generative machine learning models design sequences by latent space sampling (104, 105, 106, 107, 108) (green). The TR-Rosetta neural network predicts the probability of the structure of a given sequence. The difference between the desired structure and the predicted structure can be backpropagated through the neural network to optimize the sequence (109) (blue). TR-Rosetta, transform-restrained Rosetta.
Figure 4
Figure 4
Advances in scoring functions. A, a membrane scoring function (124) uses a continuous hydration fraction to calculate the free energy change of residues from water to the lipid environment. Water pores in membrane proteins are explicitly modeled. B, protein design scoring functions are generalized to model small molecules (132) and carbohydrates (131). C, the TERMs-based scoring function (133) breaks proteins into tertiary structure motifs and evaluates the fitness of the sequence for any local structure using the sequence profiles of the tertiary motifs. D, machine learning methods predict the probability of sequences given a structure (102) or the probability of structures given a sequence (109). The predicted probabilities can be used as scores for the compatibility between sequences and structures. TERMs, tertiary structural motifs.
Figure 5
Figure 5
Advances in design of new protein functions.A, a apixaban (yellow) binder designed by the Convergent Motifs for Binding Sites (COMBS) algorithm (25). B, A de novo protein (green) binds the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike protein (gray) (4). C, de novo proteins self-assemble into heterodimers (120), two-dimensional materials (9), filaments (8), cages (140), and alpha amyloids (143). D, a de novo–designed multipass transmembrane protein that has a defined membrane orientation (148). E. the designed DANCER protein has a tryptophan side chain that switches between predicted conformational states on the millisecond timescale (152).
Figure 6
Figure 6
Advances in the design of protein switches that change conformation in response to diverse signals.A, a designed helical trimer changes its oligomerization state in response to pH changes (155). B, a designed helical bundle protein changes conformation upon binding to a calcium ion (green) and a chloride ion (blue) (156). C, a designed artificial chemically induced dimerization system (12) assembles upon binding to a farnesyl pyrophosphate ligand (spheres), linking ligand binding (sensing) to a modular response through reconstitution of a split output module (gray, magenta). D, in the LOCKR system, a helical peptide “key” (magenta) can displace and expose a signal peptide (green) (15). LOCKR, latching orthogonal cage-key proteins.
Figure 7
Figure 7
Success rates reported for design studies listed inTable 1. The success rate is defined as the percentage of reported designs in each study that adopt the designed structure (folded, blue; experimental structure determined, orange) or function (green, red). The circle size denotes the number of folded/functional designs in each study. The success rates for studies where proteins were de novo–designed to have new structures are varied but can be high with many designs (blue). In contrast, success rates and numbers of successful designs for proteins with new functions (green) are much lower, except in a few cases where functional designs were all-helical proteins (red). Only studies that reported ten or more experimentally characterized designs (Table 1) are included. “Folded” refers to designs that were characterized by CD and/or NMR spectroscopy or had an experimentally determined structure, displayed the expected oligomerization state (if measured), and/or were functional (if designed to have a function).

References

    1. Huang P.S., Boyken S.E., Baker D. The coming of age of de novo protein design. Nature. 2016;537:320–327. - PubMed
    1. Kuhlman B., Bradley P. Advances in protein structure prediction and design. Nat. Rev. Mol. Cell Biol. 2019;20:681–697. - PMC - PubMed
    1. Chevalier A., Silva D.A., Rocklin G.J., Hicks D.R., Vergara R., Murapa P., Bernard S.M., Zhang L., Lam K.H., Yao G., Bahl C.D., Miyashita S.I., Goreshnik I., Fuller J.T., Koday M.T. Massively parallel de novo protein design for targeted therapeutics. Nature. 2017;550:74–79. - PMC - PubMed
    1. Cao L., Goreshnik I., Coventry B., Case J.B., Miller L., Kozodoy L., Chen R.E., Carter L., Walls A.C., Park Y.J., Strauch E.M., Stewart L., Diamond M.S., Veesler D., Baker D. De novo design of picomolar SARS-CoV-2 miniprotein inhibitors. Science. 2020;370:426–431. - PMC - PubMed
    1. Mohan K., Ueda G., Kim A.R., Jude K.M., Fallas J.A., Guo Y., Hafer M., Miao Y., Saxton R.A., Piehler J., Sankaran V.G., Baker D., Garcia K.C. Topological control of cytokine receptor signaling induces differential effects in hematopoiesis. Science. 2019;364 - PMC - PubMed

Publication types