Being a binding site: characterizing residue composition of binding sites on proteins
- PMID: 18305831
- PMCID: PMC2241929
- DOI: 10.6026/97320630002216
Being a binding site: characterizing residue composition of binding sites on proteins
Abstract
The Protein Data Bank contains the description of more than 45,000 three-dimensional protein and nucleic-acid structures today. Started to exist as the computer-readable depository of crystallographic data complementing printed articles, the proper interpretation of the content of the individual files in the PDB still frequently needs the detailed information found in the citing publication. This fact implies that the fully automatic processing of the whole PDB is a very hard task. We first cleaned and re-structured the PDB data, then analyzed the residue composition of the binding sites in the whole PDB for frequency and for hidden association rules. Main results of the paper: (i) the cleaning and repairing algorithm (ii) redundancy elimination from the data (iii) application of association rule mining to the cleaned non-redundant data set. We have found numerous significant relations of the residue-composition of the ligand binding sites on protein surfaces, summarized in two figures. One of the classical data-mining methods for exploring implication-rules, the association-rule mining, is capable to find previously unknown residue-set preferences of bind ligands on protein surfaces. Since protein-ligand binding is a key step in enzymatic mechanisms and in drug discovery, these uncovered preferences in the study of more than 19,500 binding sites may help in identifying new binding protein-ligand pairs.
Keywords: association rules; binding site; functions; protein; structural data.
Figures


Similar articles
-
Cysteine and tryptophan anomalies found when scanning all the binding sites in the Protein Data Bank.Int J Bioinform Res Appl. 2010;6(6):594-608. doi: 10.1504/IJBRA.2010.03874. Int J Bioinform Res Appl. 2010. PMID: 21354965
-
Domain-based small molecule binding site annotation.BMC Bioinformatics. 2006 Mar 17;7:152. doi: 10.1186/1471-2105-7-152. BMC Bioinformatics. 2006. PMID: 16545112 Free PMC article.
-
sc-PDB: an annotated database of druggable binding sites from the Protein Data Bank.J Chem Inf Model. 2006 Mar-Apr;46(2):717-27. doi: 10.1021/ci050372x. J Chem Inf Model. 2006. PMID: 16563002
-
Data Mining of Macromolecular Structures.Methods Mol Biol. 2016;1415:107-38. doi: 10.1007/978-1-4939-3572-7_6. Methods Mol Biol. 2016. PMID: 27115630
-
The Art of Compiling Protein Binding Site Ensembles.Mol Inform. 2016 Dec;35(11-12):593-598. doi: 10.1002/minf.201600043. Epub 2016 May 30. Mol Inform. 2016. PMID: 27870245 Review.
Cited by
-
Discovering amino acid patterns on binding sites in protein complexes.Bioinformation. 2011 Mar 2;6(1):10-4. doi: 10.6026/97320630006010. Bioinformation. 2011. PMID: 21464838 Free PMC article.
-
PDB_Amyloid: an extended live amyloid structure list from the PDB.FEBS Open Bio. 2018 Nov 22;9(1):185-190. doi: 10.1002/2211-5463.12524. eCollection 2019 Jan. FEBS Open Bio. 2018. PMID: 30652085 Free PMC article.
-
SCARF: a biomedical association rule finding webserver.J Integr Bioinform. 2022 Feb 4;19(1):20210035. doi: 10.1515/jib-2021-0035. J Integr Bioinform. 2022. PMID: 35119233 Free PMC article.
-
Many InChIs and quite some feat.J Comput Aided Mol Des. 2015 Aug;29(8):681-94. doi: 10.1007/s10822-015-9854-3. Epub 2015 Jun 17. J Comput Aided Mol Des. 2015. PMID: 26081259 No abstract available.
References
LinkOut - more resources
Full Text Sources