Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Oct 27:8:45.
doi: 10.1186/1472-6807-8-45.

Protein functional surfaces: global shape matching and local spatial alignments of ligand binding sites

Affiliations

Protein functional surfaces: global shape matching and local spatial alignments of ligand binding sites

T Andrew Binkowski et al. BMC Struct Biol. .

Abstract

Background: Protein surfaces comprise only a fraction of the total residues but are the most conserved functional features of proteins. Surfaces performing identical functions are found in proteins absent of any sequence or fold similarity. While biochemical activity can be attributed to a few key residues, the broader surrounding environment plays an equally important role.

Results: We describe a methodology that attempts to optimize two components, global shape and local physicochemical texture, for evaluating the similarity between a pair of surfaces. Surface shape similarity is assessed using a three-dimensional object recognition algorithm and physicochemical texture similarity is assessed through a spatial alignment of conserved residues between the surfaces. The comparisons are used in tandem to efficiently search the Global Protein Surface Survey (GPSS), a library of annotated surfaces derived from structures in the PDB, for studying evolutionary relationships and uncovering novel similarities between proteins.

Conclusion: We provide an assessment of our method using library retrieval experiments for identifying functionally homologous surfaces binding different ligands, functionally diverse surfaces binding the same ligand, and binding surfaces of ubiquitous and conformationally flexible ligands. Results using surface similarity to predict function for proteins of unknown function are reported. Additionally, an automated analysis of the ATP binding surface landscape is presented to provide insight into the correlation between surface similarity and function for structures in the PDB and for the subset of protein kinases.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Automated identification of protein binding surfaces and construction of SurfaceShapeSignatures (SSS). The nicotinamide-adenine-dinucleotide phosphate (NADP) binding surface from human pathogen S. pyogenes (PDB:2ahr, a) is defined by measuring the change in solvent accessibility between the bound and apo structure (b, pink). The SSS of a binding surface is constructed by measuring the inter-atomic Euclidean distances between all unique surface atom pairs (c). The signatures of select DNA, ligand and metal binding surfaces for proteins in the PDB.
Figure 2
Figure 2
Identification of a threshold for SurfaceShapeSignatures. SSS distances obtained by querying the ATP binding site of cAMP-dependent kinase (PDB:1atp) against the GPSS ligand surface library are plotted against the molecular weight of the ligand corresponding to the library surface (a). Ligands with MW ± 100 D of ATP are highlighted in yellow. The molecular shape similarity Taniomoto score between ATP and the ligand corresponding to the library surface is plotted in (b). Tanimoto scores greater than 0.7 (blue) are generally regarded as similar. The correlation coefficients for molecular weight and shape similarity are 0.46 and 0.45, respectively, and the corresponding regression lines are shown in red. Our selected threshold distance of 0.3 (green) for use in our SurfaceScreen methodology eliminates less than 1% of true-positive surfaces in our benchmarking exercises.
Figure 3
Figure 3
The SurfaceAlign algorithm identifies the optimal alignment of spatially conserved residues. 6,220,800 alignment combinations and permutations are required for the alignment of 25 conserved residues of the heme binding pockets of myoglobin from P. catodon (a) and structural genomics target hemoglobin alpha-1 from P. flavescens (c). 100 alignment solutions are shown in stick representations (b). An alignment series shows the superposition of the solutions calculated towards converging to the optimal alignment (d). The myobglobin query surface is shaded in grayscale to represent the cRMSD values (black represents a large cRMSD and white represent small cRMSD) and the hemoglobin surface is colored by the shapely color scheme[77].
Figure 4
Figure 4
Calculating volume overlap between aligned surfaces. A surface on F420-0:gamma-glutamyl ligase homolog from A. fuldgidus (PDB:2g9i) (a) has a well conserved sub-surface (b, forest green) to the GDP binding surface in GDP-binding protein from B. taurus (c). A superposition of the surfaces from the alignment (d). When the volume overlap of the alignment is measured (e, purple), the large volume disparity between the surfaces masks the similarity with global surface volume overlap (gSVOT) score of 0.37. Using only the conserved residues of the alignment (f) to measure the local global volume overlap (lSVOT) reveals the similarity with lSVOT score of 0.71 (g, purple).
Figure 5
Figure 5
The SurfaceScreen methodology uses the SSS algorithm to rapidly pre-classify surfaces based on shape complimentarity. Similarly shaped surfaces are then spatially aligned using the SurfaceAlign algorithm and scored. While the GPSS library also contains surfaces from DNA, metal and peptide binding surfaces, in this study, only ligand binding surfaces were considered.
Figure 6
Figure 6
Retrieval of HIV-1 proteases from the GPSS library using surface similarity. The binding surface of human HIV-1 protease (ab) complexed with inhibitor BEB (c) was queried against the GPSS library. The sorted KS distances are shown in (d) with other HIV-1 proteases highlighted in red. ROC curves for retrieval using SurfaceShapeSignature, SurfaceAlign and SurfaceScreen scoring are shown in (e). The highest ranking non-protease surface was from the DcmaT (h) binding surface aclacinomycin methylesterase (RdmC) from S. purpurascens (fg). A superposition of the surfaces based on the SurfaceAlign alignment (ij) and with their respective ligands (k).
Figure 7
Figure 7
Retrieval of functionally diverse heme binding proteins. Heme binding proteins myoglobin (a, CATH code = 1.10.490.10, PDB:1mbn), nitrophorin (c, CATH code = 1.40.128.20, PDB:1np4), and inducible nitric oxide synthase (iNOS) (e, CATH code = 3.90.1230.10, PDB:4nos)[46]. The structures are positioned such that the proprionate groups are all oriented in the same direction. The corresponding heme binding surfaces are shown adjacent, after being rotated 90 degrees along the Y-axis. Shape signatures for each surface are shown in (g). The ROC curves for retrieval of heme binding surfaces querying myoglobin from P. catodon (PDB:1mbn) against the GPSS library (h). The Ampcpr binding surface (i) from ADPRase is the best non-heme binding surface returned from the search. A superposition of the ligands suggests ligand-shape complimentarity driving the binding surface similarity (j).
Figure 8
Figure 8
Identification of a convergent heme binding surfaces from surface similarity. Despite lacking sequence or structural homology to the heme-monooxygenase family, IsdG from S. aureus (a, yellow) contains a conserved surface allowing it to perform heme-monooxygenase activity. When compared to the heme binding surface from heme oxygenase (HmuO) from C. diphtheria (b, green), 19 residues are conserved (c) with similar global shape characteristics (d). The superposition of the conserved residues is shown for the best scoring cRMSD (e) and oRMSD (g) alignments. The alignments are colored by residue type (IsgG large radius, HmuO small radius) in (fh). The superposition of the surfaces resulting in the maximum volume overlap (i, red) is shown with bound heme from HmuO (j).
Figure 9
Figure 9
Binding surface-based classification of structural homologs. Putative binding surfaces for structural genomics targets with structural homology to IsdG (PDB:1xbw) and IsdI (PDB:1sqe) from S. aureus are clustered by SurfaceScreen scores. The heme binding pocket is well conserved in protein TT1390 from T. thermopilus (PDB:1iuj) and protein BC2969 from B. cereus (PDB:1tz0). ActVA-Orf66 from S. coelicolor (PDB:1lq9), is known to bind 6-deoxydihydrokalafungin (6-DHHK)[51]. Cofactors are shown immediately below each protein.
Figure 10
Figure 10
Retrieval of ATP binding proteins from functionally and conformationally diverse classes. Binding surfaces representing different ATP conformational classes: cAMP- dependent kinase (PDB:1atp, a), protein kinase CK2 from Z. Mays (PDB:1a6o, b), ATP:corrinoid adenosyltransferase from S. typhimurium (PDB:1g5t, c), PurT-encoded glycinamide ribonucleotide transformylase from E. coli (PDB:1kj8, d). A superposition of the molecules from each class (f). The retrieval rate for each binding surface against the GPSS library is shown as an ROC plot in (e). The retrieval rates are calculated using the SurfaceScreen score.
Figure 11
Figure 11
Crystallographic validation of GDP binding prediction in structural genomics target. The strong similarity of the putative binding surface of F420-0:gamma-glutamyl ligase homolog from A. fuldgidus[56] (a) to the GDP binding surface in GDP-binding protein from B. taurus (Figure 3c) allows a GDP molecule (red, colored by element) to be posed into the surface based on the surface superposition. The structure was determined with bound GDP (green, colored by element) with RMSD of 1.0Ǻ from predicted position (b). The addition of the ligand to the crystallization conditions improved the quality of the structure from 2.5Ǻ (a, gray) to 1.35Ǻ (a, green) and allows loop regions (magenta) to be modeled.
Figure 12
Figure 12
Clustering of 116 non-redundant ATP binding sites based on their surface similarity. The dendrogram represents the results of complete-linkage clustering, applied to SurfaceScreen score between all surfaces in our dataset (a). Each node is color-coded representing its biological functions as assessed through EC numbers or literature references. A second grayscale-coded shape can be found on all node edges that corresponds to the ATP conformations in Figure 9. A representative binding surface from each cluster is shown in (b).
Figure 13
Figure 13
Mapping ATP binding surface cluster membership and ATP conformation class. Observed frequencies for hydrolases (a), ligases (b), and transferases (c) are shown. Surface cluster numbers correspond to Figure 11(b). ATP conformation class labels correspond to Figure 9. The sums for each row and column are shown on the edges of each plot.
Figure 14
Figure 14
An all-against-all comparison of ATP binding surfaces in the PDB. The dendrogram represents the results of complete-linkage clustering, applied to SurfaceScreen score between all surfaces in our dataset (a). The nodes of the dendrogram are color coded for kinase families according to KinBase nomenclature. A branch of the cluster (gray box) is called-out to highlight the unexpected similarity discovered between the STI-571 binding site in c-Abl kinase and serine/threonine kinase p38 MAP (b).
Figure 15
Figure 15
Unique conformation of p38 MAP kinase creates similar binding surface to c-Abl kinase. The binding surface of inhibitor STI-571 in c-Abl kinase (a, PDB:1opj) shows strong similarity to the binding surface of inhibitor B96 in p38 MAP kinase (b, PDB:1kv2). p38 MAP kinase has DFG motif configuration (stick representation) similar to that seen in c-Abl. SurfaceAlign superposition of the surfaces (c). STI-571 is posed into the p38 MAP binding surface based on the surface alignments (d).

References

    1. Ye Y, Li Z, Godzik A. Modeling and analyzing three-dimensional structures of human disease proteins. Pac Symp Biocomput. 2006:439–450. - PubMed
    1. Artymiuk PJ, Poirrette AR, Grindley HM, Rice DW, Willett P. A graph-theoretic approach to the identification of three-dimensional patterns of amino acid side-chains in protein structures. J Mol Biol. 1994;243:327–344. - PubMed
    1. Kinoshita K, Nakamura H. Identification of protein biochemical functions by similarity search using the molecular surface database eF-site. Protein Sci. 2003;12:1589–1595. - PMC - PubMed
    1. Schmitt S, Kuhn D, Klebe G. A new method to detect related function among proteins independent of sequence and fold homology. J Mol Biol. 2002;323:387–406. - PubMed
    1. Kuhn D, Weskamp N, Schmitt S, Hullermeier E, Klebe G. From the similarity analysis of protein cavities to the functional classification of protein families using cavbase. J Mol Biol. 2006;359:1023–1044. - PMC - PubMed

Publication types

MeSH terms