Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Dec 30;2(5):216-21.
doi: 10.6026/97320630002216.

Being a binding site: characterizing residue composition of binding sites on proteins

Affiliations

Being a binding site: characterizing residue composition of binding sites on proteins

Gábor Iván et al. Bioinformation. .

Abstract

The Protein Data Bank contains the description of more than 45,000 three-dimensional protein and nucleic-acid structures today. Started to exist as the computer-readable depository of crystallographic data complementing printed articles, the proper interpretation of the content of the individual files in the PDB still frequently needs the detailed information found in the citing publication. This fact implies that the fully automatic processing of the whole PDB is a very hard task. We first cleaned and re-structured the PDB data, then analyzed the residue composition of the binding sites in the whole PDB for frequency and for hidden association rules. Main results of the paper: (i) the cleaning and repairing algorithm (ii) redundancy elimination from the data (iii) application of association rule mining to the cleaned non-redundant data set. We have found numerous significant relations of the residue-composition of the ligand binding sites on protein surfaces, summarized in two figures. One of the classical data-mining methods for exploring implication-rules, the association-rule mining, is capable to find previously unknown residue-set preferences of bind ligands on protein surfaces. Since protein-ligand binding is a key step in enzymatic mechanisms and in drug discovery, these uncovered preferences in the study of more than 19,500 binding sites may help in identifying new binding protein-ligand pairs.

Keywords: association rules; binding site; functions; protein; structural data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Association rules-Set 1: Figure 1 was created by deleting all X → GLY association rules for clarity, and including those rules which satisfy that their supports are at least 7.15% and their confidences are at least 0.5 and, moreover, at least one of the following conditions hold: (a) their confidences are at least 0.8 or (b) their lifts are at least 1.8 or (c) their lifts are at most 0.97 or (d) their supports are at least 24%. The color and width of the arrows corresponds to the lift, the color of residue-sets corresponds to the support, as shown on the figure legend. Four areas are identifiable on the figure: in the lower half the rules of large lifts are shown; in the upper left corner the rules of high confidences (with one exception), in the upper middle part the lower than 0.97 lift rules, and in the upper right corner the high support rules are shown. Note that these rules form almost disjoint classes
Figure 2
Figure 2
Association rules-Set 2: The figure was created by deleting all X→GLY association rules for clarity, and including only those rules which satisfy that their support is at least 7.15% and their confidence is at least 0.55 and their lift is at least 1.7

Similar articles

Cited by

References

    1. Berman H, et al. Nucleic Acids Research. 2000;28:235. - PMC - PubMed
    1. Szabadka Z, Grolmusz V. Proceedings of the 28th IEEE EMBS Annual International Conference; New York. 2006. p. 5755. - PubMed
    1. Rovner SL. Chem and Eng News. 2005;83:39.
    1. Adam D. Nature. 2002;417:369. - PubMed
    1. Lovász L, Plummer MD. Matching theory. North-Holland Publishing Co; 1986. p. 121.

LinkOut - more resources