Physicochemical descriptors to discriminate protein-protein interactions in permanent and transient complexes selected by means of machine learning algorithms
- PMID: 16955490
- DOI: 10.1002/prot.21104
Physicochemical descriptors to discriminate protein-protein interactions in permanent and transient complexes selected by means of machine learning algorithms
Abstract
Analyzing protein-protein interactions at the atomic level is critical for our understanding of the principles governing the interactions involved in protein-protein recognition. For this purpose, descriptors explaining the nature of different protein-protein complexes are desirable. In this work, the authors introduced Epic Protein Interface Classification as a framework handling the preparation, processing, and analysis of protein-protein complexes for classification with machine learning algorithms. We applied four different machine learning algorithms: Support Vector Machines, C4.5 Decision Trees, K Nearest Neighbors, and Naïve Bayes algorithm in combination with three feature selection methods, Filter (Relief F), Wrapper, and Genetic Algorithms, to extract discriminating features from the protein-protein complexes. To compare protein-protein complexes to each other, the authors represented the physicochemical characteristics of their interfaces in four different ways, using two different atomic contact vectors, DrugScore pair potential vectors and SFCscore descriptor vectors. We classified two different datasets: (A) 172 protein-protein complexes comprising 96 monomers, forming contacts enforced by the crystallographic packing environment (crystal contacts), and 76 biologically functional homodimer complexes; (B) 345 protein-protein complexes containing 147 permanent complexes and 198 transient complexes. We were able to classify up to 94.8% of the packing enforced/functional and up to 93.6% of the permanent/transient complexes correctly. Furthermore, we were able to extract relevant features from the different protein-protein complexes and introduce an approach for scoring the importance of the extracted features.
(c) 2006 Wiley-Liss, Inc.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Miscellaneous
