Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Aug;25(8):639-653.
doi: 10.1038/s41580-024-00718-y. Epub 2024 Apr 2.

Opportunities and challenges in design and optimization of protein function

Affiliations
Review

Opportunities and challenges in design and optimization of protein function

Dina Listov et al. Nat Rev Mol Cell Biol. 2024 Aug.

Abstract

The field of protein design has made remarkable progress over the past decade. Historically, the low reliability of purely structure-based design methods limited their application, but recent strategies that combine structure-based and sequence-based calculations, as well as machine learning tools, have dramatically improved protein engineering and design. In this Review, we discuss how these methods have enabled the design of increasingly complex structures and therapeutically relevant activities. Additionally, protein optimization methods have improved the stability and activity of complex eukaryotic proteins. Thanks to their increased reliability, computational design methods have been applied to improve therapeutics and enzymes for green chemistry and have generated vaccine antigens, antivirals and drug-delivery nano-vehicles. Moreover, the high success of design methods reflects an increased understanding of basic rules that govern the relationships among protein sequence, structure and function. However, de novo design is still limited mostly to α-helix bundles, restricting its potential to generate sophisticated enzymes and diverse protein and small-molecule binders. Designing complex protein structures is a challenging but necessary next step if we are to realize our objective of generating new-to-nature activities.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Goals of protein design methodology.
A. The fundamental paradigm of protein science is that amino acid sequence determines structure which in turn determines function. Protein design has been historically dubbed as the inverse folding problem of finding the sequence that will fold to a desired structure (step 3. Fold Design). By extension, the problem of finding the sequences and structures that can realize a desired function is the inverse function problem (step 4. Function Design). Computational methods for (1) structure prediction and (2) function annotation have recently made considerable advances. The inverse problems of fold (3) and function (4) design have also made important progress, but significant gaps remain in the design of complex folds and functions relative to those observed in nature. B. A fitness landscape is an abstract model that relates protein variants to their relative activities. Nearby points on the landscape represent close sequences and the relative heights represent relative fitnesses or activities. Evolution and conventional methods in protein engineering iteratively explore nearby mutants (dashed black lines) starting from a naturally occurring protein (circle close to the viewer). Such methods may find local fitness optima but are restricted in their ability to find distant ones (peak in the background). New computational design methods may be able to reach such solutions which are unlikely to emerge through evolutionary approaches or would emerge only slowly.
Figure 2
Figure 2. Computational stability design.
A. Schematic representation of the folding landscape of a marginally stable protein generated using physics-based design methods alone (top) versus physics and phylogeny-driven approaches (bottom). Physics-based methods have access only to the native state and may inadvertently lower the energy of competing (misfolded and unfolded) states, thereby not addressing negative design. Mutations that lower the energy of undesired states are likely to be purged by evolutionary selection; accordingly, a protein designed using a combination of atomistic and phylogenetic constraints may preferentially fold into the native state. E: energy, vdW: van der Waals interactions, elec: Coulomb electrostatic interactions, sol: solvation. B. Stability design of the SARS-CoV-2 Spike protein monomer (PDB entry: 6VYB). A design that encodes 20 mutations (S2D14) in the S2 Spike protein subunit (top left) increased protein yields 11-fold (top right) while improving pseudovirus neutralization titers (pVNT50) of several SARS-CoV-2 variants of concern (bottom). S-2P is a rationally designed Spike variant that encodes stabilizing mutations Lys986Pro and Val987Pro. Data for panel (B) generously provided by Wayne Harshbarger.
Figure 3
Figure 3. Design of enhanced activity.
A. Amino acid residues (orange) in the UPO active site are in close proximity (PDB entry: 6EKZ). Computational design introduces combinations of simultaneous mutations to generate stable and preorganized active sites. Heme shown in cyan; the substrate propranolol in purple. B. Select designs were experimentally tested for C−H oxyfunctionalization of a variety of styrenes. The yellow and blue bars indicate the fraction of enantiomers generated by the designs and starting enzyme (WT). Molecular structures of the products are shown at the top. Designs show striking changes in enantiospecificity (compare, for instance, WT and d28 in the right-most column). C. The active site of a bacterial phosphotriesterase (PDB entry: 1HZY). Wild type residues are in gray sticks, and designed mutations in a quadruple mutant are in cyan. D. Mutations in the active site of the phosphotriesterase show strong epistasis in their impact on the measured activity (2-naphthyl acetate hydrolysis). Each circle represents a phosphotriesterase mutant, and the area of the circle is proportional to the specific activity of the design. The starting enzyme exhibits low specific activity (360 μM s-1 mg-1 protein). Each of the point mutants exhibits improved specific activity, but activity declines in the double mutants relative to the His257Trp single mutant. Last, the quadruple mutant (designed enzyme) substantially improves specific activity relative to all single or double mutants. Adapted from ref .
Figure 4
Figure 4. Design methods must navigate an astronomically large sequence space that is extremely sparse in functional proteins.
(I) The theoretical sequence space of an average-sized protein is huge: for a 350 amino acid protein, 20350 >10455 sequences. By designing enzyme domains as in the CADENZ approach, the hypothetical space of possibilities approaches such numbers. Even when designing amino acids only in a functional site (II), as in the case of GFP design (htFuncLib), the hypothetical sequence space is orders of magnitude greater than the number of viral particles in the world, and no experimental screening method could search it adequately. In the case of GFP, phylogenetic data and physics-based calculations focus the search and reduce the sequence space by 18 orders of magnitude (III). This still leaves a sequence space that cannot be assayed even by the most high-throughput methods currently available, such as ribosome display. A machine-learning procedure can filter out mutations that do not combine with one another to form stable and foldable proteins, leading to the machine learning (ML) restricted space (IV). This restriction enables constructing an effective library that is enriched with stable and potentially functional designs. From this sequence space of ~107 variants, 104 functional fluorescent proteins were recovered (V) compared to fewer than 100 active-site variants recorded in the fluorescent proteins database.
Figure 5
Figure 5. Comparison of structural features in natural and de novo designed proteins.
A) De novo proteins are yet to reach the structural complexity of natural ones as measured by size and relative contact order. Structures of a de novo α-helix bundle (PDB: 7CBC) are highlighted versus two natural proteins (PDBs: 3NF4 and 3ZQJ) B) Secondary Structure Element (SSE) content in natural and de novo proteins. De novo proteins are biased towards high α-helix content whereas natural ones are structurally diverse. C) Relative contact order of designed protein structures found in the PDB plotted against their time of publication. Each design is also labeled according to the class of design generation method (PDBs: 1AL1, 1QYS, 3QA9, 3NF4). D) SSE content of natural and de novo binder interfaces (excluding antibodies). De novo binders are biased towards presenting helices at interaction sites compared to natural binders. Antibody binding surfaces, which are not represented in this figure, are dominated by loop regions and almost entirely devoid of helices. E) Number of interfacial residues in natural protein interactions compared to de novo protein binders. F) Hydrogen bonds in natural interfaces compared to de novo interfaces. G) Examples of de novo designed protein-protein interactions: botulinum neurotoxin binder, influenza binder, PDL-1 binder and SARS-CoV-2 receptor binding domain (RBD) binder. Targets are colored green and binders in red.
Figure 6
Figure 6. Several practical applications of de novo protein design.
De novo designed proteins present a range of possibilities for the development of new molecules with biotechnological applications. Some popular areas of research are antivirals (PDB: 3R2X), cancer therapies (PDB: 7JH5), protein based therapies (PDB: 2B5I), protein switches (PDB: 6IWB), drug delivery vehicles (PDB: 6VFI), vaccines (PDB: 3LHP) and biosensors (PDB: 7AYE). Natural proteins (targets) are colored green and the designed proteins are colored red.

References

    1. Arnold FH. Innovation by Evolution: Bringing New Chemistry to Life (Nobel Lecture) Angew Chem Int Ed Engl. 2019 doi: 10.1002/anie.201907729. - DOI - PubMed
    1. Winter G. Harnessing Evolution to Make Medicines (Nobel Lecture) Angewandte Chemie International Edition. 2019;58:14438–14445. doi: 10.1002/anie.201909343. Preprint at. - DOI - PubMed
    1. Trudeau DL, Tawfik DS. Protein engineers turned evolutionists-the quest for the optimal starting point. Curr Opin Biotechnol. 2019;60:46–52. - PubMed
    1. Packer MS, Liu DR. Methods for the directed evolution of proteins. Nat Rev Genet. 2015;16:379–394. - PubMed
    1. Arnold FH. The nature of chemical innovation: new enzymes by evolution. Q Rev Biophys. 2015;48:404–410. - PubMed

LinkOut - more resources