Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Nov 15;65(3):607-22.
doi: 10.1002/prot.21104.

Physicochemical descriptors to discriminate protein-protein interactions in permanent and transient complexes selected by means of machine learning algorithms

Affiliations

Physicochemical descriptors to discriminate protein-protein interactions in permanent and transient complexes selected by means of machine learning algorithms

Peter Block et al. Proteins. .

Abstract

Analyzing protein-protein interactions at the atomic level is critical for our understanding of the principles governing the interactions involved in protein-protein recognition. For this purpose, descriptors explaining the nature of different protein-protein complexes are desirable. In this work, the authors introduced Epic Protein Interface Classification as a framework handling the preparation, processing, and analysis of protein-protein complexes for classification with machine learning algorithms. We applied four different machine learning algorithms: Support Vector Machines, C4.5 Decision Trees, K Nearest Neighbors, and Naïve Bayes algorithm in combination with three feature selection methods, Filter (Relief F), Wrapper, and Genetic Algorithms, to extract discriminating features from the protein-protein complexes. To compare protein-protein complexes to each other, the authors represented the physicochemical characteristics of their interfaces in four different ways, using two different atomic contact vectors, DrugScore pair potential vectors and SFCscore descriptor vectors. We classified two different datasets: (A) 172 protein-protein complexes comprising 96 monomers, forming contacts enforced by the crystallographic packing environment (crystal contacts), and 76 biologically functional homodimer complexes; (B) 345 protein-protein complexes containing 147 permanent complexes and 198 transient complexes. We were able to classify up to 94.8% of the packing enforced/functional and up to 93.6% of the permanent/transient complexes correctly. Furthermore, we were able to extract relevant features from the different protein-protein complexes and introduce an approach for scoring the importance of the extracted features.

PubMed Disclaimer

Substances

LinkOut - more resources