Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Aug;54(8-9):1311-1337.
doi: 10.1002/ijch.201300145.

Learning To Fold Proteins Using Energy Landscape Theory

Affiliations

Learning To Fold Proteins Using Energy Landscape Theory

N P Schafer et al. Isr J Chem. 2014 Aug.

Abstract

This review is a tutorial for scientists interested in the problem of protein structure prediction, particularly those interested in using coarse-grained molecular dynamics models that are optimized using lessons learned from the energy landscape theory of protein folding. We also present a review of the results of the AMH/AMC/AMW/AWSEM family of coarse-grained molecular dynamics protein folding models to illustrate the points covered in the first part of the article. Accurate coarse-grained structure prediction models can be used to investigate a wide range of conceptual and mechanistic issues outside of protein structure prediction; specifically, the paper concludes by reviewing how AWSEM has in recent years been able to elucidate questions related to the unusual kinetic behavior of artificially designed proteins, multidomain protein misfolding, and the initial stages of protein aggregation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The theory behind this figure is described in Section 3. A logarithmic plot of the number of structures with a given energy E. The expected ground state of a set of compact decoy structures corresponding to the molten globule can be inferred from landscape theory and is indicated by the intersection of a parabola with the abscissa. When there are more possible decoys the trap states become more competitive and are easier to confuse with the native state at EF. The gap is reflected also in the characteristic temperatures TF and TG whose inverses are indicated as slopes on this diagram. A large TF/TG corresponds with a large gap and easy recognition. Recognition becomes progressively more difficult as one moves from threading-based decoys (“Thr”), to fragment assembly (“FA”) and finally to fully flexible backbone molecular dynamics (“MD”).
Figure 2
Figure 2
Successful and efficient search is possible if the energy landscape is funneled. In this funnel diagram, the depth represents the solvent averaged free energy of specific structures while the width represents the entropy of possible states. This figure is essentially Figure 1 turned on its side. Random coil states at the top of the funnel are like the gaseous state. Compact candidate structures are the so called molten globule or liquid phase and the worst traps or decoys are deep local minima that impede search and might be confused with the native state.
Figure 3
Figure 3
The water mediated interaction switches smoothly between two interaction weights depending on the degree of burial of the interacting residues. This switching function is shown as a function of the degree of burial of the two residues participating in the interaction. This figure was adapted from [20].
Figure 4
Figure 4
Two plots showing the correlation between optimized parameters in a coarse-grained Hamiltonian and experimental quantities. The burial energy is correlated with a hydrophobicity scale and the secondary structure energies are correlated with experimentally determined secondary structure propensities. This figure was adapted from [39].
Figure 5
Figure 5
The three optimized interaction matrices used by the AMW/AWSEM models. The interaction weight for each pair of residue types (shown in one letter codes) is represented in color such that red interactions are favorable and blue interactions are unfavorable. Two residues interact using the interaction weights specified in the “direct” matrix when their Cβ atoms are between 4.5 and 6.5 Å of each other. When the Cβ atoms of two residues are between 6.5 and 9.5 Å of each other, the interaction can be either “water mediated” (Low-Density) or “protein mediated” (High-Density), depending on the degree of burial of the two residues.
Figure 6
Figure 6
The fraction of the trace obtained by summing the N largest eigenvalues is plotted as a function of N for four different interaction matrices including Miyazawa-Jernigan (red), and the three interaction matrices shown in Figure 5 (orange, green and blue for direct, water mediated and protein mediated, respectively). The curve for the Miyazawa-Jernigan interaction matrix saturates much more quickly than any of the curves corresponding to the AMW/AWSEM matrices, indicating an overall lower information content.
Figure 7
Figure 7
This figure is an example of how energy landscape analysis can be used to help understand structure prediction results. In (A), the results of the predictions of two proteins are shown by plotting the final Q values obtained in 20 independent simulated annealing runs in order of decreasing Q. In parts (B), (C) and (D), the expectation values of the total energy, secondary structure energy, and tertiary structure energy are plotted as a function of Q, respectively. In this case, the desgined protein Top7 shows a greater degree of funneling to the native state than S6, and this is reflected in better structure prediction results. This figure was adapted from [112].
Figure 8
Figure 8
A structure prediction result from 1989. This figure compares the predicted structure of Desulfovibrio vulgaris rubredoxin using an associative memory Hamiltonian containing 80 possible structures, only one of which was the homolog from Clostridium pasteurianum which differs from the vulgaris sequence in 50% of its positions, demonstrating the model can generalize at least to the extent of local mutational substitution. The search algorithm employed a Monte Carlo assignment of local dipeptides from a database of known structures [37].
Figure 9
Figure 9
A structure prediction result for cytochrome C from 1991. The unoptimized associative memory Hamiltonian used a set of memories that included other cytochromes but these had significant insertions and deletions in their sequence. A molecular dynamics based annealing method was used [36].
Figure 10
Figure 10
A structure prediction result from 1998 for a calcium binding protein using an optimized associative memory Hamiltonian. While the memory set contained homologs, they were distance in sequence identity and the final structure showed the algorithm to be “creative” in that it was closer to the native structure than any input homolog. The correct structure is shown in green, the prediction in red, and the best input homolog in blue [58].
Figure 11
Figure 11
A structure prediction result from 2000. A comparison of predicted and actual structures of 434 repressor. This result was obtained with an associative memory Hamiltonian having an optimized contact Hamiltonian. No homologs were used in the memory set [44].
Figure 12
Figure 12
A prediction of the CASP3 target HdeA. This prediction employed the optimized water mediated interactions [81].
Figure 13
Figure 13
This was the result of a blind prediction made by the Wolynes group of CASP5 target T0170 with current PDB code 1UZC. This was one of the best models for this fold submitted in that round of CASP. The methodology was based on the AMC code [89].
Figure 14
Figure 14
This figure shows the quality of prediction for 1r69 that can be obtained from an optimized potential containing water mediated interactions and optimized hydrogen bonding energies alone. No local bioinformatic input was used. The left-hand structure use a generically predicted secondary structure based on propensities as input, the right hand prediction used no secondary structure bias at all. Predicted structure traces are in blue, the crystal structure in red. Clearly local signals are helpful [78].
Figure 15
Figure 15
The maximum Q sampled during annealing for each target is plotted against sequence length for three different choices of allowed degree of homology between the target and associative memory database structures. “Homologs excluded” is shown in light blue squares, “homologs allowed” is shown in dark blue triangles and “homologs only” is shown in red triangles. For comparison, a previous set of results from [47] is shown in green diamonds. This figure was adapted from [20].
Figure 16
Figure 16
The highest sampled Q structures for 1r69 (left) and 2fha (right) using the AWSEM model with a homologs excluded associative memory database are shown aligned to their native structures. For the smaller of the two proteins, the predicted and native structures have nearly identical backbone structures. The larger protein has significant similarity to the native structure but shows defects in some of the local structure as well as the packing of helices. This figure was adapted from [20].
Figure 17
Figure 17
Several energetically competitive non-native structures were found via simulated annealing of the AWSEM “homologs excluded” model for Top7. These misfolded states correspond to β-strand mispairings and are consistent with the notion that the kinetics may be complicated by transitions between compact states. This figure was adapted from [112].
Figure 18
Figure 18
This figure shows energy landscape analysis on three versions of the TakadaN protein: full twenty amino acid type sequence (red), MJ5 five letter reduced sequence (green), and BL2 two letter reduced sequence (blue). Both the total energy (left) and secondary structure energy (right) in AWSEM is funneled to high Q for the full sequence. The five letter sequence has an energetic trap at low Q, which was found to come from a competing secondary structure. The two letter sequence has a very rugged and flat landscape, and is therefore practically unfoldable. This figure was adapted from [112].
Figure 19
Figure 19
Snapshots of best predicted structures (yellow) using AWSEM, compared with the PDB structure (blue). The name of the proteins, the PDB ID, the number of residues, and the RMSD for the Cα atoms of the predicted complex compared with the PDB structure, are shown. The first 8 dimers are homodimers, the last 4 are heterodimers. This figure was adapted from [127].
Figure 20
Figure 20
For Arc repressor (top) and Lambda repressor (bottom), free energy surfaces at the folding temperature are plotted as a function of the number of non-native intermonomeric contacts Nnon-native, QA and Q of the complex. I, U and N stand for intermediate, unbound and native bound states, respectively. This figure was adapted from [127].
Figure 21
Figure 21
Energy and free energy surfaces for I27–I27 at its folding temperature. Nself and Nswap are the number of self-recognition contacts and the number of domain-swapped contacts, respectively. The trapped states I have higher energies than the native states N, as shown in the z-axis, but have similar free energies as the native states, as shown by the color coding of the free energy, with scale indicated in the side bar. We see that the ensemble I states are entropically favored. As temperature increases, the intermediate ensemble will become more stable than the native ensemble. This figure was adapted from [129].
Figure 22
Figure 22
Two types of misfolded structures of the wild-type tetramer are shown in three-dimensional (top) and simplified two-dimensional (bottom) representations. In the 2-D model, bold colors indicate the actual structures found in the AWSEM molecular dynamics simulations and the light colors are examples of how these structures might further develop in the presence of more protein copies. In each protein, there are two sticky segments, shown in orange and blue. A solid line represents the rest of each protein. Dashed lines represent stabilizing interactions formed between two sticky segments from different proteins. A fibrillar structure is shown on the left and a branching structure is shown on the right. The presence of two or more sticky segments in one protein allows for a greater diversity of possible misfolded structures. This figure was adapted from [128].

References

    1. Anfinsen C. Studies of the principles that govern the folding of protein chains (nobel lecture) Norstedt & Sons; Stockholm: 1972. - PubMed
    1. Baker D, Agard DA. Kinetics versus thermodynamics in protein folding. Biochemistry. 1994;33(24):7505–7509. - PubMed
    1. Benilova I, Karran E, De Strooper B. The toxic a-beta oligomer and alzheimer’s disease: an emperor in need of clothes. Nature neuroscience. 2012;15(3):349–357. - PubMed
    1. Bennett MJ, Sawaya MR, Eisenberg D. Deposition diseases and 3d domain swapping. Structure. 2006;14(5):811–824. - PubMed
    1. Berthelot K, Cullin C, Lecomte S. What does make an amyloid toxic: Morphology, structure or interaction with membrane? Biochimie. 2012:12–19. - PubMed

LinkOut - more resources