Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2019 Nov;20(11):681-697.
doi: 10.1038/s41580-019-0163-x. Epub 2019 Aug 15.

Advances in protein structure prediction and design

Affiliations
Review

Advances in protein structure prediction and design

Brian Kuhlman et al. Nat Rev Mol Cell Biol. 2019 Nov.

Abstract

The prediction of protein three-dimensional structure from amino acid sequence has been a grand challenge problem in computational biophysics for decades, owing to its intrinsic scientific interest and also to the many potential applications for robust protein structure prediction algorithms, from genome interpretation to protein function prediction. More recently, the inverse problem - designing an amino acid sequence that will fold into a specified three-dimensional structure - has attracted growing attention as a potential route to the rational engineering of proteins with functions useful in biotechnology and medicine. Methods for the prediction and design of protein structures have advanced dramatically in the past decade. Increases in computing power and the rapid growth in protein sequence and structure databases have fuelled the development of new data-intensive and computationally demanding approaches for structure prediction. New algorithms for designing protein folds and protein-protein interfaces have been used to engineer novel high-order assemblies and to design from scratch fluorescent proteins with novel or enhanced properties, as well as signalling proteins with therapeutic potential. In this Review, we describe current approaches for protein structure prediction and design and highlight a selection of the successful applications they have enabled.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Protein folding landscapes and energies.
(a) Simplified, two-dimensional representations of ‘golf course’ and ‘funnel’-shaped energy landscapes. Identifying the native energy minimum (‘N’) in the landscape on the left requires exhaustive exploration, whereas a simple downhill search from most starting points will locate the native state in the landscape on the right. (b) Energetic features that distinguish the protein native state include: hydrophobic patterning (shown here in a cut-away view of the small protein ubiquitin) with burial of non-polar side chains in the protein core; backbone and side chain hydrogen bonding (hydrogen bonds shown as dotted green lines); tight side chain packing (visible in a slice through a protein core); and restricted backbone and side chain torsion angle distributions (evident in the highly focused two-dimensional probability distributions of backbone — Phi angle [G] versus Psi angle [G] — and side chain — Chi1 angle [G] versus Chi2 angle — torsion angles for the amino acid isoleucine). (c) Computational models of protein energetics offer a trade-off between speed and accuracy. Coarse-grained models are computationally efficient and effectively smooth the energy landscape, permitting large-scale sampling, however they also introduce inaccuracies such as false minima (the blue basin to the left of the native minimum highlighted with an arrow, for example). High-resolution, atomically-detailed energy functions are more accurate but also slower to evaluate and sensitive to structural detail, which introduces bumpiness (many local minima) into the landscape and makes them harder to navigate efficiently.
Figure 2.
Figure 2.. Key steps in template-free structure prediction.
An accurate multiple sequence alignment between the target protein and its sequence homologs contains valuable information on the amino acid variation between the homologous sequences, including correlated patterns of sequence changes occurring at different positions (the green and orange stars highlight pairs of alignment columns displaying amino acid charge and size swapping, respectively) (step 1). The target sequence and the multiple sequence alignment form the basis for predictions of local backbone structure including torsion angles (Phi and Psi predictions shown with red error bars indicating uncertainties) and secondary structure (step 2; PSIPRED predictions are shown). Libraries of backbone fragments taken from proteins predicted to have similar local structure can also be assembled for use in model building. The multiple sequence alignment can be used to predict residue pairs likely to be in spatial contact based on the observation of correlated mutations in pairs of alignment columns (step 3). These predictions of local structure and residue contacts guide 3D model building with techniques such as gradient-based optimization, distance geometry, or fragment assembly (step 4; snapshots from a Rosetta fragment assembly trajectory are shown). Initial 3D models are typically built with a reduced representation and coarse-grained energy function; to better determine near-native predictions, these models are refined with an all-atom energy function and compared with one another to identify clusters of similar low-energy conformations from which representative models are chosen as final predictions (step 5; a 2D principal components projection of the space of refined models is shown in which each dot represents a single model).
Figure 3.
Figure 3.. Overview of the protein design process.
Projects in computational protein design can be distilled down to two key steps. First, a model of the desired structure and/or complex needs to be created. For de novo design this can be accomplished by piecing together fragments of naturally occurring proteins (left column). Designing complexes requires moving the proteins (middle column) and/or ligands (right column) relative to each other (frequently referred to as docking) so that their surfaces are adjacent. After a model of the protein fold or complex is created, sequence optimization simulations are used to find sequences that stabilize the desired fold or complex (bottom row).
Figure 4.
Figure 4.. Using computational design to create proteins that have valuable applications in research and medicine.
a) Increasing protein stability. Energy calculations from protein design simulations were combined with sequence conservation information to identify 18 mutations that raise the thermostability of a malaria invasion protein by more than 15°C, thereby improving its recombinant production for use as a vaccine immunogen . b) Manipulating binding specificity. Re-design of interactions between antibody chains allowed the self-assembly of two unique light chains and two unique heavy chains into bispecific antibodies that can simultaneously bind two different antigens. These antibodies can be used for a variety of applications including the recruitment of T-cells to cancer cells as a form of immunotherapy . c) Design of interaction interfaces. Design of an interaction between two homo-oligomers (orange trimer and blue pentamer; step 1) induced self-assembly of a large protein cage (step 2) and allowed for multi-valent display of an antigen from respiratory syncytial virus (RSV), thereby establishing a nanoparticle vaccine candidate, (step 3). This nanoparticle with the viral antigen on the surface induced neutralizing antibody responses that were ~10-fold higher than when the antigen was provided alone. d) De novo design of an interleukin mimic that binds to a subset of interleukin receptors, allowing it to maintain anti-cancer activity while reducing toxicity (blue colour in the protein structures indicates native binding surfaces). The designed protein maintains selected helices (shown in blue) in their naturally occurring orientations while embedding them in a new protein scaffold. e) De novo design of a protein scaffold that presents a conformational epitope from RSV. In vivo, this epitope-focused immunogen elicited antibodies that neutralize the virus and is being tested as a vaccine. f) De novo design strategies can also be used to design proteins optimized to bind certain ligands. For example, two custom-built backbones with different protein folds — β-barrel and a helical bundle — were generated to bind and activate a fluorescent ligand and to bind porphyrin, respectively,.

References

    1. Jones DT, Singh T, Kosciolek T. & Tetchner S. MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics 31, 999–1006 (2015). - PMC - PubMed
    1. Wang S, Sun S, Li Z, Zhang R. & Xu J. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model. PLoS Comput. Biol 13, e1005324 (2017). This paper presents an accurate deep learning method that predicts residue–residue contacts by integrating 1-dimensional sequence features with 2-dimensional residue covariation and pairwise interaction features. - PMC - PubMed
    1. Huang J. et al. CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat. Methods 14, 71–73 (2017). - PMC - PubMed
    1. Park H. et al. Simultaneous Optimization of Biomolecular Energy Functions on Features from Small Molecules and Macromolecules. J. Chem. Theory Comput 12, 6201–6212 (2016). - PMC - PubMed
    1. Heo L. & Feig M. Experimental accuracy in protein structure refinement via molecular dynamics simulations. Proc. Natl. Acad. Sci. U. S. A 115, 13276–13281 (2018). - PMC - PubMed

Publication types