Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 Sep;12(9):2001-14.
doi: 10.1110/ps.03154503.

A graph-theory algorithm for rapid protein side-chain prediction

Affiliations

A graph-theory algorithm for rapid protein side-chain prediction

Adrian A Canutescu et al. Protein Sci. 2003 Sep.

Abstract

Fast and accurate side-chain conformation prediction is important for homology modeling, ab initio protein structure prediction, and protein design applications. Many methods have been presented, although only a few computer programs are publicly available. The SCWRL program is one such method and is widely used because of its speed, accuracy, and ease of use. A new algorithm for SCWRL is presented that uses results from graph theory to solve the combinatorial problem encountered in the side-chain prediction problem. In this method, side chains are represented as vertices in an undirected graph. Any two residues that have rotamers with nonzero interaction energies are considered to have an edge in the graph. The resulting graph can be partitioned into connected subgraphs with no edges between them. These subgraphs can in turn be broken into biconnected components, which are graphs that cannot be disconnected by removal of a single vertex. The combinatorial problem is reduced to finding the minimum energy of these small biconnected components and combining the results to identify the global minimum energy conformation. This algorithm is able to complete predictions on a set of 180 proteins with 34342 side chains in <7 min of computer time. The total chi(1) and chi(1 + 2) dihedral angle accuracies are 82.6% and 73.7% using a simple energy function based on the backbone-dependent rotamer library and a linear repulsive steric energy. The new algorithm will allow for use of SCWRL in more demanding applications such as sequence design and ab initio structure prediction, as well addition of a more complex energy function and conformational flexibility, leading to increased accuracy.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Splitting clusters into components. (A) Splitting a cluster into two components in SCWRL2.95 and earlier versions of SCWRL. (B) Splitting a cluster into biconnected components in SCWRL3.0. First-order articulation points are in light gray, and the second-order articulation point is in dark gray.
Figure 2.
Figure 2.
Solving a cluster by using biconnected components. The minimum energy configuration of the cluster shown in Figure 1 ▶ is identified by stepwise solution of biconnected components. Each biconnected component is solved as shown in the right margin, and the collapsed component is shown as superresidues in curly brackets.
Figure 3.
Figure 3.
Algorithm for splitting a graph into biconnected components. In the bottom part of the figure, the depth-first search numbers (DFN) are shown in the circles for each node, and the low numbers (L) are shown in shaded squares adjacent to each node. Inequalities between the DFNs and L numbers of adjacent residues that indicate the presence of articulation points (and also the root node) are shown.
Figure 4.
Figure 4.
Pseudocode for biconnected component algorithm.
Figure 5.
Figure 5.
Backtracking algorithm. (A) A tree representing a cluster of four interacting residues. Each level of the tree indicates an additional residue, and each branch a different rotamer. One combination of rotamers for all four residues is indicated, such that residue 1 is in rotamer 2, residue 2 is in rotamer 1, residue 3 is in rotamer 4, and residue 4 is in rotamer 2. (B) After sorting the residues in terms of the number of rotamers. The same full combination is indicated.
Figure 6.
Figure 6.
Complexity of combinatorial problem at various stages of calculation. Complexity is represented as log10 of the number of combinations that would have to be searched at each stage, if subsequent stages were not performed. Test set of 180 proteins was used. (A) Log10 of combinations of top 90% rotamers. (B) Log10 of combinations of post-DEE rotamers. (C) Log10 of sum of cluster combinations. (D) Log10 of sum of biconnected component combinations. (E) Log10 of total number of nodes in backtracking trees searched
Figure 7.
Figure 7.
Cluster sizes in 180 proteins. Size of graphs after DEE step is performed and edges among residue vertices are determined. Clusters of size 1 are not shown.
Figure 8.
Figure 8.
Biconnected component sizes in 180 proteins. The number of biconnected components of size 1 is 2325 (truncated on the graph to show the distribution at higher sizes).

Similar articles

Cited by

References

    1. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. 2000. The Protein Data Bank. Nucleic Acids Res. 28 235–242. - PMC - PubMed
    1. Bower, M.J., Cohen, F.E., and Dunbrack Jr., R.L. 1997. Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: A new homology modeling tool. J. Mol. Biol. 267 1268–1282. - PubMed
    1. Dahiyat, B.I. and Mayo, S.L. 1996. Protein design automation. Protein Sci. 5 895–903. - PMC - PubMed
    1. De Maeyer, M., Desmet, J., and Lasters, I. 1997. All in one: A highly detailed rotamer library improves both accuracy and speed in the modelling of side-chains by dead-end elimination. Fold Des. 2 53–66. - PubMed
    1. ———. 2000. The dead-end elimination theorem: Mathematical aspects, implementation, optimizations, evaluation, and performance. Methods Mol. Biol. 143 265–304. - PubMed

Publication types

LinkOut - more resources