Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Jun 16;359(4):1023-44.
doi: 10.1016/j.jmb.2006.04.024. Epub 2006 Apr 25.

From the similarity analysis of protein cavities to the functional classification of protein families using cavbase

Affiliations

From the similarity analysis of protein cavities to the functional classification of protein families using cavbase

Daniel Kuhn et al. J Mol Biol. .

Abstract

In this contribution, the classification of protein binding sites using the physicochemical properties exposed to their pockets is presented. We recently introduced Cavbase, a method for describing and comparing protein binding pockets on the basis of the geometrical and physicochemical properties of their active sites. Here, we present algorithmic and methodological enhancements in the Cavbase property description and in the cavity comparison step. We give examples of the Cavbase similarity analysis detecting pronounced similarities in the binding sites of proteins unrelated in sequence. A similarity search using SARS M(pro) protease subpockets as queries retrieved ligands and ligand fragments accommodated in a physicochemical environment similar to that of the query. This allowed the characterization of the protease recognition pockets and the identification of molecular building blocks that can be incorporated into novel antiviral compounds. A cluster analysis procedure for the functional classification of binding pockets was implemented and calibrated using a diverse set of enzyme binding sites. Two relevant protein families, the alpha-carbonic anhydrases and the protein kinases, are used to demonstrate the scope of our cluster approach. We propose a relevant classification of both protein families, on the basis of the binding motifs in their active sites. The classification provides a new perspective on functional properties across a protein family and is able to highlight features important for potency and selectivity. Furthermore, this information can be used to identify possible cross-reactivities among proteins due to similarities in their binding sites.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The expanded definition of pseudocenters in Cavbase. A donor pseudocenter (blue sphere) was introduced at the side-chain of cysteine to account for the hydrogen bond donor properties of cysteine. Pi pseudocenters (orange spheres) were introduced at terminal side-chains of asparagine, aspartate, glutamine, glutamate, and arginine, reflecting their ability to form π–π interactions with neighboring functional groups, including those of ligands. Hydrogen bond acceptor and aliphatic pseudocenters are shown as red and white spheres, respectively.
Figure 2
Figure 2
Validation of property surface patches in Cavbase by comparison with Drugscore maps. A comparison of the Drugscore hotspots with the Cavbase binding site description is shown for the binding pocket of dihydro-orotate-dehydrogenase (PDB code 1d3h). In (I) the Cavbase surface patches, annotated with respect to the five physicochemical properties of the neighboring pseudocenters, are shown as dotted surfaces (color scheme used: H bond donor (blue), H bond acceptor (red), ambivalent donor/acceptor (green), hydrophobic aliphatic (white) or aromatic/pi (orange)). (II) to (IV) display three types of Drugscore hotspots, together with the Cavbase surface. The color coding of the hotspots corresponds to that of the related Cavbase properties (Drugscore atom type N.3 (red, II), O.2 (blue, III) and C.ar (orange, IV)). Drugscore hotspots that describe directional interactions match very well with the corresponding Cavbase surface patches. The contour levels are calibrated for each atom type in such a way that 0.6% of the grid points are assigned to the most favorable interaction areas.
Figure 3
Figure 3
Consideration of edge-to-face interactions of aromatic moieties in the Cavbase binding site description and similarity searches. The binding pocket of a hydrolase (PDB code 1tum) is used to demonstrate the influence of different parameter settings for pseudocenters describing hydrophobic aromatic interactions, with respect to their exposure onto the protein surface (color coding as in Figure 2). Two phenylalanine residues (carbon atoms colored orange) can potentially perform edge-to-face interactions towards the cavity surface. Using the original angular parameter settings (left), no interaction towards the cavity surface would be considered for either ring (violet areas). Recalibrating the angle between the standard vector r and the mean orientation vector v to 100° (right) allows the recognition of edge-to-face interactions, and both pi pseudocenters can now expose their property onto the cavity surface (orange areas).
Figure 4
Figure 4
The nucleotide cofactor binding sites in UDP-galactose-4-epimerase (PDB code 1xel, grey carbon atoms) and acyl-CoA-dehydrogenase (PDB code 1e6w, yellow carbon atoms) detected as similar by Cavbase. The NADH binding is performed by virtually identical amino acids. In (I), the matching pseudocenters (color coding as in Figure 2) in both binding pockets are shown, together with NADH. In (II), the matching amino acids are displayed in addition, suggesting pronounced structural conservation between the two binding sites. The comparison of the UDP galactose with a glucose oxidase (PDB code 1gal, yellow carbon atoms) reveals a case where the nucleotide cofactor binding is performed by entirely different amino acids, although their physicochemical interactions are similar, and this is detected by Cavbase. The matching pseudocenters and cofactors (NADH (1xel) and FAD (1gal)) in the two binding sites are shown in (III). In (IV), the amino acids superimposed on the matched pseudocenters, are displayed.
Figure 5
Figure 5
The superposition of the matched protein side-chains, pseudocenters and bound inhibitors from the comparison of SARS Mpro (grey carbon atoms) and rhinovirus 3C protease (yellow carbon atoms). The SARS Mpro inhibitor peptide 1 and the rhinoviral inhibitor AG7088 2 superimpose convincingly. On the left, the surface patches found by Cavbase for the SARS protease are shown; on the right, those of the rhinovirus 3C protease are displayed. The surface patches are color-coded according to the corresponding physicochemical properties (see Figure 2).
Figure 6
Figure 6
Representation of the SARS CoV Mpro S1 (I) and S2 (II) subpockets, with the physicochemical properties of the adjacent pseudocenters mapped onto the surface (color coding as in Figure 2). The considered pseudocenters were selected in such a way as to fully characterize each subpocket. The S1 subpocket exhibits a stronger polar character, whereas the S2 subpocket exhibits an aliphatic-aromatic character.
Figure 7
Figure 7
Analysis of the ligands and ligand fragments found in the 250 best ranked binding pockets that could be superimposed with the peptidic SARS inhibitor using the SARS CoV Mpro S1 and S2 subpockets as query cavities in a Cavbase similarity search. The ranking was performed using the R1 scoring function. The binding pockets are classified according to seven generic chemotypes, by which the bound ligand is characterized (Table 3).
Figure 8
Figure 8
An optimal Cavbase clustering solution of the enzyme test dataset. Optimal clustering is achieved based on 13 predefined output clusters. The rb clustering algorithm and scoring function R1 were used. The mutual similarity of the binding cavities, computed by the scoring function R1, is indicated by the intensity of the red color (dark red, pronounced similarity, white, no similarity). Cavbase separates entries from the different protein families into distinct clusters.
Figure 9
Figure 9
Clustering results for the α-CA isozymes, using four different parameter settings. In all cases, the number of output clusters was set to eight. In (I), (II), and (III), the clustering algorithm rb was used, together with the scoring schemes R1, R2, and R3, respectively. Mutual similarities are expressed by the intensity of the red color (see Figure 8). The different scoring schemes produce consistent results and suggest a reasonable clustering. In (IV), the clustering based on the agglo algorithm, in combination with scoring scheme R1, is shown. It tends to produce several singletons early and seems to merge many entries into one large cluster. By predefining a larger number of clusters, this large cluster would be decomposed into several smaller clusters.
Figure 10
Figure 10
Areas matched between the binding sites of two CA-V wild-type entries (left) (PDB codes 1dmx and 1dmy) and between a wild-type and a mutant isozyme (right) (PDB codes 1dmy and 1keq). For reasons of clarity, only the corresponding pseudocenters, the bound zinc ions (violet spheres) and the sulfonamide inhibitor (1dmy), together with the three histidine residues involved in zinc binding, are shown (color coding as in Figure 2). Additionally, on the left, the phenylalanine and tyrosine residues are displayed (carbon atoms in magenta), that were mutated to alanine and cysteine, respectively. The pseudocenters of the mutated amino acids cannot be matched; but Cavbase still detects pronounced similarities in the binding site.
Figure 11
Figure 11
Cavbase clustering for a kinase dataset of 30 cavities (see Figure 8). The R1 scoring function and the rb clustering algorithm were used to generate six distinct clusters. Cavbase differentiates between the 30 kinases at the subfamily level. The clusters (along the principal diagonal from bottom-left to top-right) comprise cavities from the mitogen-activated protein kinases (MAP) of the (a) p38α and (b) Erk2 subfamilies, (c) the cyclin-dependent protein kinases (CDKs) and src kinase, (d) the fibroblast growth factor receptor kinases and tyrosine kinases, (e) the serine/threonine kinase subfamily, and (f) the cAMP-dependent protein kinase subfamily.
Figure 12
Figure 12
Superposition of MAP kinase binding sites. On the left, a superposition of two Erk2 kinases (PDB code 1erk and 1gol) is shown. Large portions of the binding sites are recognized as being similar. On the right, a superposition of an Erk2 kinase (PDB code 3erk, carbon atoms colored gray) and a MAP p38α kinase (PDB code 1bl7, carbon atoms colored yellow) is displayed. In both pictures, the matching pseudocenters and the hinge backbone protein atoms are displayed (color coding as in Figure 2). Based on the similar hinge binding region, Cavbase superimposes both inhibitors convincingly. In both cases, the hinge coordination via the hydrogen bond from the pyrimidine nitrogen atom of the inhibitor to Asp104 (Erk2) and His107 (p38α) is detected.
Figure 13
Figure 13
Superposition of phosphorylase kinase (PDB code 1phk) and cAMP-dependent kinase (PDB code 1atp). The matching pseudocenters and bound ATP molecules are displayed (color coding as in Figure 2). The two cavities show extensive similarities in the entire ATP binding pocket, comprising areas next to the hinge region and the adenosine binding site, as well as the DFG motif and parts of the activation loop.

Similar articles

Cited by

References

    1. Martin A.C., Orengo C.A., Hutchinson E.G., Jones S., Karmirantzou M., Laskowski R.A., et al. Protein folds and functions. Structure. 1998;6:875–884. - PubMed
    1. Orengo C.A., Sillitoe I., Reeves G., Pearl F.M. Review: what can structural classifications reveal about protein evolution? J. Struct. Biol. 2001;134:145–165. - PubMed
    1. Nagano N., Orengo C.A., Thornton J.M. One fold with many functions: the evolutionary relationships between TIM barrel families based on their sequences, structures and functions. J. Mol. Biol. 2002;321:741–765. - PubMed
    1. Anantharaman V., Aravind L., Koonin E.V. Emergence of diverse biochemical activities in evolutionarily conserved structural scaffolds of proteins. Curr. Opin. Chem. Biol. 2003;7:12–20. - PubMed
    1. Weber A., Casini A., Heine A., Kuhn D., Supuran C.T., Scozzafava A., Klebe G. Unexpected nanomolar inhibition of carbonic anhydrase by COX-2-selective celecoxib: new pharmacological opportunities due to related binding site recognition. J. Med. Chem. 2004;47:550–557. - PubMed

MeSH terms

LinkOut - more resources