Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 May 22;8 Suppl 4(Suppl 4):S9.
doi: 10.1186/1471-2105-8-S4-S9.

A robust and efficient algorithm for the shape description of protein structures and its application in predicting ligand binding sites

Affiliations

A robust and efficient algorithm for the shape description of protein structures and its application in predicting ligand binding sites

Lei Xie et al. BMC Bioinformatics. .

Abstract

Background: An accurate description of protein shape derived from protein structure is necessary to establish an understanding of protein-ligand interactions, which in turn will lead to improved methods for protein-ligand docking and binding site analysis. Most current shape descriptors characterize only the local properties of protein structure using an all-atom representation and are slow to compute. We need new shape descriptors that have the ability to capture both local and global structural information, are robust for application to models and low quality structures and are computationally efficient to permit high throughput analysis of protein structures.

Results: We introduce a new shape description that requires only the Calpha atoms to represent the protein structure, thus making it both fast and suitable for use on models and low quality structures. The notion of a geometric potential is introduced to quantitatively describe the shape of the structure. This geometric potential is dependent on both the global shape of the protein structure as well as the surrounding environment of each residue. When applying the geometric potential for binding site prediction, approximately 85% of known binding sites can be accurately identified with above 50% residue coverage and 80% specificity. Moreover, the algorithm is fast enough for proteome-scale applications. Proteins with fewer than 500 amino acids can be scanned in less than two seconds.

Conclusion: The reduced representation of the protein structure combined with the geometric potential provides a fast, quantitative description of protein-ligand binding sites with potential for use in large-scale predictions, comparisons and analysis.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the algorithm. The solid body and circles indicate an all-atom and Cá atom representation, respectively. Open circles are virtual atoms determined by the algorithm. (1) Step 1: the protein structure is represented as Cá atoms. (2) Step 2: Cá atoms are Delaunay tessellated. The convex hull is determined at the same time. (3) Step 3: the environmental boundary (red solid lines) is determined from the Delaunay tessellation by peeling off the tetrahedra (triangles labeled as a, b, and c) with edge lengths larger than 30.0 Å (black dashed lines) starting from the convex hull. (4) Step 4: the protein boundary (blue and purple solid lines). The purple lines are overlapped with the environmental boundary and determined from the Delaunay tessellation by removing tetrahedra with circumscribed sphere radius larger than 7.5 Å. (5) Step 5–7: shape descriptors such as residue surface direction and geometric potential for each Cá atom position are computed and ligand binding sites and virtual atoms (open circle) are predicted.
Figure 2
Figure 2
Radius distribution of solid and non-solid circumscribed spheres from the Cα atom Delaunay tessellation of the protein structure. A tetrahedron is defined as solid if its four edges are formed by amino acid residues considered to be in contact (see Methods).
Figure 3
Figure 3
Definition of true/false positives and true/false negatives for the predicted ligand binding site residues evaluated with respect to the referenced ligand binding site in a protein. True and false positives are the correctly and incorrectly predicted number of binding site residues in a protein, respectively. True and false negatives are the correctly and incorrectly predicted number of non-binding site residues, respectively. They are defined for each known ligand binding site on a protein by protein basis.
Figure 4
Figure 4
Distribution of geometric potentials of residues that are involved and not involved in ligand binding from known protein-ligand complexes in the benchmark. (A)single residues; (B) residue clusters that correspond to the ligand binding site and those randomly generated from the protein structure (see Methods).
Figure 5
Figure 5
Distribution of the standard deviations of geometric potential and relative solvent accessible surface area in the binding sites, scaled between 0.0 and 100.0.
Figure 6
Figure 6
Reduced protein structure representation with Cα atoms and Delaunay tessellation showing the computed geometric potential. Each vertex in the figure corresponds to a Cα atom. The relative values of the geometric potential are color-coded, red (highest), blue (lowest). The structures shown in the left and the right columns are holo and apo proteins, respectively. The known ligand binding site is located in the white circle. The RMSD between the holo and apo protein across the whole structure is shown above the arrow. (A) Immunoglobulin 48g7 germline fab (PDB id: 1AJ7 and 2RCS); (B) Adenylate kinase (PDB id: 1AKE and 4AKE); (C) HIV-1 reverse transcriptase (PDB id: 1VRT and 1RTJ); (D) Maltodextrin binding protein (PDB id: 1ANF and 1OMP).
Figure 7
Figure 7
Distribution of the sensitivity for correctly predicted ligand binding sites ranked as first (blue bar) and in the top three (red bar), respectively, for all protein complexes in the benchmark.
Figure 8
Figure 8
Distribution of (A) specificity and (B) sensitivity of all predicted ligand binding sites with respect to the referenced ligand binding sites. See Methods and Figure 3 for the definition of the specificity and the sensitivity.
Figure 9
Figure 9
Sensitivity vs. specificity of all predicted ligand binding sites. Each point in the figure corresponds to a predicted ligand binding site. ~85% of predictions have sensitivity above 50% and specificity above 80%, as shown by the red circle.
Figure 10
Figure 10
An example of ligand binding site prediction from CASTp [29]. (A) sensitivity vs. specificity for all atom and Cα only representations with different probe radii by CASTp predictions [29]. As a comparison, the prediction from the geometric potential is marked as a solid blue circle. (B) The predicted two largest binding sites from CASTp [29] with Cα only representation and probe radius 2.8 Å. The two largest pockets with similar volume and surface area are shown in the figure. The pockets colored green and red represent a known ligand binding site and a helical interface, respectively.
Figure 11
Figure 11
Time complexity of the algorithm. Times are run times on a single non-dedicated 3.0 GHz processor with 2.0 GB of RAM.

References

    1. Coleman RG, Sharp KA. Travel depth, a new shape descriptor for macromolecules: application to ligand binding. J Mol Biol. 2006;362:441–458. doi: 10.1016/j.jmb.2006.07.022. - DOI - PubMed
    1. Nayal M, Honig B. On the nature of cavities on protein surfaces: application to the identification of drug bidning sites. Proteins: Struct Funct Bioinform. 2006;63:892–906. doi: 10.1002/prot.20897. - DOI - PubMed
    1. Coleman RG, Burr MA, Sourvaine DL, Cheng AC. An intuitive approach to measuring protein surface curvature. Proteins: Struct Funct Bioinform. 2005;61:1068–1074. doi: 10.1002/prot.20680. - DOI - PubMed
    1. Agarwal PK, Edelsbrunner H, Harer J, Wang Y. Extreme elevation on a 2-manifold. Symp Comp Geo. 2004;20:357–365.
    1. Hendrix DK, Kuntz ID. Surface solid angle-based site points for molecular docking. Pac Symp Biocomput: 1998. 1998. pp. 317–326. - PubMed

Publication types

LinkOut - more resources