Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jun 5:11:306.
doi: 10.1186/1471-2105-11-306.

Beyond rotamers: a generative, probabilistic model of side chains in proteins

Affiliations

Beyond rotamers: a generative, probabilistic model of side chains in proteins

Tim Harder et al. BMC Bioinformatics. .

Abstract

Background: Accurately covering the conformational space of amino acid side chains is essential for important applications such as protein design, docking and high resolution structure prediction. Today, the most common way to capture this conformational space is through rotamer libraries - discrete collections of side chain conformations derived from experimentally determined protein structures. The discretization can be exploited to efficiently search the conformational space. However, discretizing this naturally continuous space comes at the cost of losing detailed information that is crucial for certain applications. For example, rigorously combining rotamers with physical force fields is associated with numerous problems.

Results: In this work we present BASILISK: a generative, probabilistic model of the conformational space of side chains that makes it possible to sample in continuous space. In addition, sampling can be conditional upon the protein's detailed backbone conformation, again in continuous space - without involving discretization.

Conclusions: A careful analysis of the model and a comparison with various rotamer libraries indicates that the model forms an excellent, fully continuous model of side chain conformational space. We also illustrate how the model can be used for rigorous, unbiased sampling with a physical force field, and how it improves side chain prediction when used as a pseudo-energy term. In conclusion, BASILISK is an important step forward on the way to a rigorous probabilistic description of protein structure in continuous space and in atomic detail.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Dihedral angles in glutamate: Dihedral angles are the main degrees of freedom for the backbone (ϕ and ψ angles) and the side chain (χ angles) of an amino acid. The number of χ angles varies between zero and four for the 20 standard amino acids. The figure shows a ball-and-stick representation of glutamate, which has three χ angles. The fading conformations in the background illustrate a rotation around χ1. The figure was made using PyMOL http://www.pymol.org.
Figure 2
Figure 2
The BASILISK dynamic Bayesian network: The network shown represents an amino acid with two χ angles, such as for example histidine. In this case, the DBN consists of four slices: two slices for the ϕ, ψ angles, followed by two slices for the χ angles. Sampling a set of χ angles is done as follows. First, the values of the input nodes (top row) are filled with bookkeeping indices that determine both the amino acid type (for example histidine) and the labels of the angles (for histidine, ϕ, ψ followed by χ1 and χ2). In the next step, the hidden node values (middle row, discrete nodes) are sampled conditioned upon the observed nodes. These observed nodes always include the index nodes (top row, discrete nodes), and optionally also the ϕ, ψ nodes (first two nodes in the bottom row) if the sampling is conditioned on the backbone. Finally, a set of χ angles is drawn from the von Mises nodes (bottom row), whose parameters are specified by the sampled values of the hidden nodes.
Figure 3
Figure 3
Univariate histograms for lysine and arginine: The histograms marked "Training" represent the training set. The histograms marked "BASILISK" represent BASILISK samples. For each amino acid, all histograms are plotted on the same scale.
Figure 4
Figure 4
χ1 versus χ2 histogram for isoleucine: The two-dimensional histogram of χ1 (x-axis) against χ2 (y-axis) illustrates the association between the two angles. Their univariate, marginal distributions are shown as well, attached to the respective axes. The histogram marked "Training" represents the training set, while the histogram marked "BASILISK" represents BASILISK samples.
Figure 5
Figure 5
Comparison between BASILISK and a standard rotamer library: We calculated the log-likelihood for every rotamer in the Dunbrack backbone independent rotamer library according to the Gaussian model of the library itself (y-axis), and according to BASILISK (x-axis). The Pearson correlation coefficient is 0.91.
Figure 6
Figure 6
Backbone dependency of the χ1 angle of aspartate: The χ1 histograms for different areas of the Ramachandran plot indicate a strong correlation between the backbone and the side chain conformations. For some regions certain peaks disappear almost entirely. Histograms marked with "Training" represent the training set, and histograms marked "BASILISK" represent BASILISK samples. A, B and C show the histograms for the areas indicated by the three boxes, while D shows the histogram over the entire space. Note that the indicated binning of the backbone space is for visualization only, as BASILISK does not rely on discretization of the conformational space.
Figure 7
Figure 7
Selecting the optimal model: The Akaike information criterion (AIC) is used to determine the optimal number of hidden node values. The AIC score (y axis) points towards a model with 30 hidden node values (x-axis). The optimal model is shown as a filled circle.

Similar articles

Cited by

References

    1. Chandrasekaran R, Ramachandran GN. Studies on the conformation of amino acids. XI. Analysis of the observed side group conformation in proteins. Int J Protein Res. 1970;2:223–233. - PubMed
    1. Ponder JW, Richards FM. Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. J Mol Biol. 1987;193:775–791. doi: 10.1016/0022-2836(87)90358-5. - DOI - PubMed
    1. Dunbrack RL, Karplus M. Backbone-dependent rotamer library for proteins. Application to side-chain prediction. J Mol Biol. 1993;230:543–574. doi: 10.1006/jmbi.1993.1170. - DOI - PubMed
    1. Eyring H. Steric hindrance and collision diameters. J Am Chem Soc. 1932;54:3191–3203. doi: 10.1021/ja01347a022. - DOI
    1. Dunbrack RL. Rotamer libraries in the 21st century. Curr Opin Struct Biol. 2002;12:431–440. doi: 10.1016/S0959-440X(02)00344-5. - DOI - PubMed

Publication types