Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Aug;89(2):867-75.
doi: 10.1529/biophysj.104.058768. Epub 2005 May 20.

Optimal clustering for detecting near-native conformations in protein docking

Affiliations

Optimal clustering for detecting near-native conformations in protein docking

Dima Kozakov et al. Biophys J. 2005 Aug.

Abstract

Clustering is one of the most powerful tools in computational biology. The conventional wisdom is that events that occur in clusters are probably not random. In protein docking, the underlying principle is that clustering occurs because long-range electrostatic and/or desolvation forces steer the proteins to a low free-energy attractor at the binding region. Something similar occurs in the docking of small molecules, although in this case shorter-range van der Waals forces play a more critical role. Based on the above, we have developed two different clustering strategies to predict docked conformations based on the clustering properties of a uniform sampling of low free-energy protein-protein and protein-small molecule complexes. We report on significant improvements in the automated prediction and discrimination of docked conformations by using the cluster size and consensus as a ranking criterion. We show that the success of clustering depends on identifying the appropriate clustering radius of the system. The clustering radius for protein-protein complexes is consistent with the range of the electrostatics and desolvation free energies (i.e., between 4 and 9 Angstroms); for protein-small molecule docking, the radius is set by van der Waals interactions (i.e., at approximately 2 Angstroms). Without any a priori information, a simple analysis of the histogram of distance separations between the set of docked conformations can evaluate the clustering properties of the data set. Clustering is observed when the histogram is bimodal. Data clustering is optimal if one chooses the clustering radius to be the minimum after the first peak of the bimodal distribution. We show that using this optimal radius further improves the discrimination of near-native complex structures.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Sketch of a free-energy landscape of protein-protein association.
FIGURE 2
FIGURE 2
(A) Distribution of a random set of points forming clusters of size 5 (any dimension) on a two-dimensional square surface. (B) Histogram of pairwise RMSD between points (bin of size 1) for the points in A has a bimodal distribution with the minimum between the two peaks corresponding to the clustering radius of the data set; also shown is the histogram for a random set of points (not shown).
FIGURE 3
FIGURE 3
Pairwise RMSD distribution of docked conformations for the complex forming 1ATN. The clustering parameter Δ = 1 − fmin/fmax, where fmin corresponds to the depth of the minimum between the first and second peak and fmax corresponds to the height of the first peak (see text).
FIGURE 4
FIGURE 4
Distribution of the ligand binding site RMSD of the best 200 (A) desolvation and (B) electrostatic receptor-ligand complexes as a function of cluster radius (in Å) for four unbound-unbound complexes and two CAPRI targets (bin size is 1 Å). The docked conformations were generated by the ClusPro server (see Ref. 18).
FIGURE 5
FIGURE 5
(A) Histograms of the pairwise RMSD of the top 1200 (900 best electrostatic and 300 best desolvation) conformations for different protein complexes. Only the relevant region, <15 Å, is shown. (B) Histograms of pairwise RMSD for different numbers of the top conformations of 1UDI complex. The data points are fitted by a cubic spline interpolation.
FIGURE 6
FIGURE 6
Clustering of seven small molecular probes on the surface of cytochrome p450-cam (1dz4). The active site is right above the heme drawn in yellow. For each probe, we kept the 20 top free-energy structures.
FIGURE 7
FIGURE 7
Distribution of the RMSD between multiple small molecular probes on the surface of (A) lysozyme (2lym) and (B) cytochrome P450-cam (1dz4) as a function of cluster radius (bin size is 0.25 Å). The number of included structures for each of seven probes varies from 5 to 50; therefore, the results of clustering 35–350 total small molecules are shown. The characteristic intercluster peak is robust with respect to the number of structures retained.

References

    1. Vriend, G., and C. Sander. 1991. Detection of common three-dimensional substructures in proteins. Proteins. 11:52–58. - PubMed
    1. Shortle, D., K. T. Simons, and D. Baker. 1998. Clustering of low-energy conformations near the native structures of small proteins. Proc. Natl. Acad. Sci. USA. 95:11158–11162. - PMC - PubMed
    1. Bystroff, C., V. Thorsson, and D. Baker. 2000. HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. J. Mol. Biol. 301:173–190. - PubMed
    1. Karplus, K., C. Barrett, and R. Hughey. 1998. Hidden Markov models for detecting remote protein homologies. Bioinformatics. 14:846–856. - PubMed
    1. Prasad, J. C., S. Comeau, S. Vajda, and C. J. Camacho. 2003. Consensus alignment for reliable framework prediction in homology modeling. Bioinformatics. 19:1682–1691. - PubMed

Publication types

LinkOut - more resources