Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jan;8(1):e1002335.
doi: 10.1371/journal.pcbi.1002335. Epub 2012 Jan 12.

Protein design using continuous rotamers

Affiliations

Protein design using continuous rotamers

Pablo Gainza et al. PLoS Comput Biol. 2012 Jan.

Abstract

Optimizing amino acid conformation and identity is a central problem in computational protein design. Protein design algorithms must allow realistic protein flexibility to occur during this optimization, or they may fail to find the best sequence with the lowest energy. Most design algorithms implement side-chain flexibility by allowing the side chains to move between a small set of discrete, low-energy states, which we call rigid rotamers. In this work we show that allowing continuous side-chain flexibility (which we call continuous rotamers) greatly improves protein flexibility modeling. We present a large-scale study that compares the sequences and best energy conformations in 69 protein-core redesigns using a rigid-rotamer model versus a continuous-rotamer model. We show that in nearly all of our redesigns the sequence found by the continuous-rotamer model is different and has a lower energy than the one found by the rigid-rotamer model. Moreover, the sequences found by the continuous-rotamer model are more similar to the native sequences. We then show that the seemingly easy solution of sampling more rigid rotamers within the continuous region is not a practical alternative to a continuous-rotamer model: at computationally feasible resolutions, using more rigid rotamers was never better than a continuous-rotamer model and almost always resulted in higher energies. Finally, we present a new protein design algorithm based on the dead-end elimination (DEE) algorithm, which we call iMinDEE, that makes the use of continuous rotamers feasible in larger systems. iMinDEE guarantees finding the optimal answer while pruning the search space with close to the same efficiency of DEE.

Availability: Software is available under the Lesser GNU Public License v3. Contact the authors for source code.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Distribution of Isoleucine in -angle space.
Isoleucine has two flexible dihedral angles (formula image and formula image angles) and the ocurrence of isoleucine conformations across a wide set of high-quality structures is plotted here. Panel A shows the entire formula image and formula image angle space, while panels B, C, and D zoom in on the region specific to one rotamer. (A) The side chains of amino acids commonly appear almost exclusively (blue dots in the plot) within specific regions of their flexible space. (B) In a rigid-rotamer model a single conformation (the red diamond) represents that entire region. (C) In a continuous-rotamer model, a voxel models the continuous region that represents the rotamer. (D) An expanded rotamer model samples additional rigid rotamers near rotamers from the rigid-rotamer model.
Figure 2
Figure 2. Toy example on the impact of rotamer minimization in protein design and DEE pruning.
(A) Many protein design algorithms select a single, discrete conformation to represent each rotamer. The discrete conformation speeds up the computation, but it can result in steric clashes (shown in red). (B) Small changes in formula image-angle space can have profound effects on the energies of interacting rotamers, particularly in the packed core of a protein. The three hydrophobic residues in this toy example can form a well-packed core through small changes in their formula image angles in this cartoon. A pruning algorithm like rigid DEE would erroneously prune the clashing rotamers since it does not account for these small changes. (C) If one rotamer, formula image, always results in conformations of higher energy than another, formula image, the rotamer formula image and all the conformations that contain formula image can be pruned. The rigid DEE algorithm prunes rotamers and amino acids that are provably not part of the rigid GMEC. (D) When rotamers can minimize within their specified voxel, rotamers and amino acids that seemed poor in a rigid model might minimize to lower energy conformations than the rigid GMEC. The lowest-energy conformation in this scenario is the minGMEC. The MinDEE algorithm and iMinDEE algorithm can provably prune rotamers in the presence of minimization.
Figure 3
Figure 3. Rigid GMEC vs. minGMEC.
(A) Fraction of the redesigned residues that had different amino acids (AA) between the rigid GMEC and the minGMEC. In 66 out of the 69 cases the minGMEC and the rigid GMEC have different sequences. The three systems where the minGMEC has the same sequence as the rigid GMEC are marked with a bold line at zero (2QSK, 1M1Q, and 1JNI). (B) Energy of the minGMEC vs. energy of the rigidMin (the post hoc minimization of the rigid GMEC), relative to the energy of the rigid GMEC, which is set to zero for each system. In 68 of 69 cases the energy of the minGMEC is lower than that of the rigidMin. For 2QSK the rotamers of the rigid GMEC are the same as the rotamers of the minGMEC, and, therefore, the energy of the rigidMin is the same as the energy of the minGMEC. The energy of the minGMEC is shown in yellow + blue bars, while the yellow color by itself shows the energy of the rigidMin. The results of this figure are identical for iMinDEE and MinDEE since both algorithms provably find the minGMEC.
Figure 4
Figure 4. The minGMEC vs. rigid DEE with an expanded rotamer library.
Two expanded rotamer libraries were used, RL1 and RL2, and they were compared against the standard rotamer library (RL0). (A) Redesigns that failed for rigid DEE using rotamer library RL2 because of the library's large size. AA: The number of mutable amino acids. (B) Fraction of the amino acids that are different between the minGMEC of MinDEE and, respectively: the rigid GMEC of RL0 (light grey), the rigid GMEC of RL1 (grey), and the rigid GMEC of RL2 (dark grey). Those designs where the sequence of the minGMEC and the sequence of the rigid GMEC are the same are marked with a bold line at zero. (C) Energy of the rigid GMEC of RL0 (light grey + grey + dark grey) vs. the rigid GMEC of RL1 (grey + dark grey) vs. the rigid GMEC of RL2 (dark grey), relative to the energy of the minGMEC, which is set to zero for each system.
Figure 5
Figure 5. The minGMEC vs. rigid DEE with an expanded rotamer library for the systems that failed with rigid DEE using rotamer library RL2.
These results compare the standard rotamer library (RL0) against an expanded rotamer library, RL1. (A) Fraction of the amino acids that are different between the minGMEC of MinDEE and, respectively: the rigid GMEC of RL0 (light grey), and the rigid GMEC of RL1 (grey). (B) Energy of the rigid GMEC of RL0 (light grey + grey) vs. the rigid GMEC of RL1 (grey), relative to the energy of the minGMEC, which is set to zero for each system.
Figure 6
Figure 6. iMinDEE algorithm illustration.
The A* branch-and-bound algorithm completely searches the conformation space and enumerates conformations in order of their low-energy bound. Because the search is complete, a large conformational search space can be computationally infeasible for A*. Therefore, a pre-A* pruning of the conformational search space with the MinDEE algorithm or iMinDEE algorithm can make the A* search feasible. (A) The entire MinDEE conformation space in the order that the A* algorithm would enumerate the conformations. A* enumerates conformations until it can prove the minGMEC (denoted as formula image) has been found, but unpruned high energy conformations slow down the search. The first conformation enumerated by A*, corresponding to the conformation with the lowest energy bound, is denoted formula image, and the lower bound on its energy is formula image. The minGMEC, formula image, is marked by a green dot and its energy is formula image. (B) Instead of MinDEE, we can use iMinDEE to prune conformations with energy bounds that are higher than the lowest energy bound by more than the initial formula image value. We then select the lowest minimized energy found so far (i.e. as opposed to lowest energy bound) and use that to compute the formula image value. The conformation with the lowest minimized energy is denoted formula image with a blue dot and its energy is formula image. (C) The iMinDEE search is repeated if formula image. Since formula image, formula image meets the condition of Eq. (6), and the search will not need to be repeated again. By setting formula image, we can use the iMinDEE criterion (Eq. (7)) to prune rotamers, and the iMinDEE algorithm will provably find the minGMEC.
Figure 7
Figure 7. Comparison of rotamer pruning with rigid DEE, MinDEE and iMinDEE.
For each tested protein, this chart shows what percentage of rotamers were pruned by each criterion. In all cases pruning with rigid DEE pruned at least as much as iMinDEE, and pruning with iMinDEE was significantly better than MinDEE.
Figure 8
Figure 8. Pruning vs value.
Most systems have small formula image values. Some outliers have larger formula image values, and in consequence, iMinDEE loses pruning efficiency in these systems.
Figure 9
Figure 9. iMinDEE predicts residues Trp3 (rotamer 3), Tyr11 (rotamer 1), Met40 (rotamer 8), and Arg44 (rotamer 15) in the structure of the PhtA histidine triad domain (PDB ID: 2CS7) to achieve a low-energy conformation.
iMinDEE precomputes low-energy bounds between all pairs of possible rotamers in structure 2CS7. This figure illustrates the lower bound between the pairs (A) Met40 and Arg44, (B) Trp3 and Arg44, and (C) Tyr11 and Arg44. Favorable vdW contacts are shown in green and blue dots, and a small steric overlap is shown in red in pane (C). All of these pairs have favorable, low energies and iMinDEE predicts all conformations containing the 4 rotamers shown in this chart to be among the lowest energy structures. (D) When all four are placed in the same conformation, however, the result is a biophysically impossible steric clash, shown by red and purple dots.
Figure 10
Figure 10. Summary of native sequence recovery results.
The recovery of native amino acid sequence by rigid DEE (the rigid GMEC) and by iMinDEE (the minGMEC) are shown. (A) Summary of amino acid side chains that contain more than one flexible dihedral angle (asp, lys, ile, trp, phe, gln, asn, leu, tyr, glu, arg, met, and his) that were not recovered by the rigid GMEC (pie chart above) and the minGMEC (pie chart below). For comparison, the recovered amino acids with more than one flexible dihedral angle are shown in grey. Residues that were not recovered are colored by their amino acid type. (B) Percentage of residues not recovered by the rigid GMEC (yellow) and the minGMEC (orange), categorized by amino acid mass. The first group (All AA) shows the total percentage of non-recovered residue positions of all amino acid types. The second group (100–130 Da) shows the percentages of non-recovered residue positions of amino acid types with a mass between 100 Da and 130 Da, and the third group shows the percentages of non-recovered residue positions of amino acid types with a mass over 130 Da.

Similar articles

Cited by

References

    1. Gorczynski MJ, Grembecka J, Zhou Y, Kong Y, Roudaia L, et al. Allosteric inhibition of the protein-protein interaction between the leukemia-associated proteins Runx1 and CBFβ. Chem Biol. 2007;14:1186–97. - PubMed
    1. Chen C, Georgiev I, Anderson A, Donald B. Computational structure-based redesign of enzyme activity. Proc Natl Acad Sci U S A. 2009;106:3764–3769. - PMC - PubMed
    1. Roberts KE, Cushing PR, Boisguerin P, Madden DR, Donald BR. Research in Com- putational Molecular Biology. volume 6577 of Lecture Notes in Computer Science. Heidelberg: Springer Berlin; 2011. Design of protein- protein interactions with a novel ensemble-based scoring algorithm. pp. 361–376.
    1. Frey KM, Georgiev I, Donald BR, Anderson AC. Predicting resistance mutations using protein design algorithms. Proc Natl Acad Sci U S A. 2010;107:13707–13712. - PMC - PubMed
    1. Harder T, Boomsma W, Paluszewski M, Frellsen J, Johansson K, et al. Beyond rotamers: a generative, probabilistic model of side chains in proteins. BMC Bioinformatics. 2010;11:306. - PMC - PubMed

Publication types