Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 19;11(1):1761.
doi: 10.1038/s41598-020-80900-2.

Classification and prediction of protein-protein interaction interface using machine learning algorithm

Affiliations

Classification and prediction of protein-protein interaction interface using machine learning algorithm

Subhrangshu Das et al. Sci Rep. .

Abstract

Structural insight of the protein-protein interaction (PPI) interface can provide knowledge about the kinetics, thermodynamics and molecular functions of the complex while elucidating its role in diseases and further enabling it as a potential therapeutic target. However, owing to experimental lag in solving protein-protein complex structures, three-dimensional (3D) knowledge of the PPI interfaces can be gained via computational approaches like molecular docking and post-docking analyses. Despite development of numerous docking tools and techniques, success in identification of native like interfaces based on docking score functions is limited. Hence, we employed an in-depth investigation of the structural features of the interface that might successfully delineate native complexes from non-native ones. We identify interface properties, which show statistically significant difference between native and non-native interfaces belonging to homo and hetero, protein-protein complexes. Utilizing these properties, a support vector machine (SVM) based classification scheme has been implemented to differentiate native and non-native like complexes generated using docking decoys. Benchmarking and comparative analyses suggest very good performance of our SVM classifiers. Further, protein interactions, which are proven via experimental findings but not resolved structurally, were subjected to this approach where 3D-models of the complexes were generated and most likely interfaces were predicted. A web server called Protein Complex Prediction by Interface Properties (PCPIP) is developed to predict whether interface of a given protein-protein dimer complex resembles known protein interfaces. The server is freely available at http://www.hpppi.iicb.res.in/pcpip/ .

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Comparison of protein–protein interaction interface properties. (A) The overlap among interface properties that were showing statistically significant (p ≤ 0.01) differences between the native and non-native like complexes, categorized either by FNAT and iRMSD criteria. HETERO_FNAT and HETERO_iRMSD provide numbers of significantly different interface properties for heterodimers while HOMO_FNAT and HOMO_iRMSD provide numbers of significantly different interface properties for homodimers native and non-native like complexes, respectively. FNAT, fraction of conserved native contacts. iRMSD, interface root mean square deviation. (B) The distribution of the common interface properties that showed statistically significant (p ≤ 0.01) differences between all the native and non-native like complexes. ASA, accessible surface area. BSA, buried surface area. H-bond, hydrogen bonds. (C,D) plot the buried surface area (BSA) of the two amino acids that possessed significantly different BSA at the native interfaces compared to the non-native ones identified based on FNAT (C) and iRMSD (D) definitions, respectively. (E,F) show the hydrogen bond forming amino acid pairs that are found to be significantly higher at the native interfaces compared to the non-native ones identified based on FNAT (E) and iRMSD (F) based definitions, respectively. (G,H) plot the average binding energy represented by ΔG for the native and non-native interfaces identified based on FNAT (G) and iRMSD (H) based definitions, respectively.
Figure 2
Figure 2
Comparison of prediction performances. The prediction performances of the SVM based prediction models (PCPIP_FNAT and PCPIP_iRMSD) for native and non-native like complexes from the Apo-Holo dataset were compared against 10 different types of scoring functions. Receiver operating characteristic (ROC) plots were created by calculating the true positive rate (TPR; Y axes) and false positive rate (FPR; X axes). PCPIP stands for Protein Complex Prediction by Interface Properties. Area under curve (AUC) values for each of the methods is also provided. Benchmarking was performed using the FNAT (A) and iRMSD (B) definitions based sub-datasets from the Apo-Holo validation set where native-like complexes were defined by FNAT > 0.8 and iRMSD < 5 Å, respectively and non-native like complexes were identified using FNAT ≤ 0.8 and iRMSD ≥ 15 Å,  respectively.
Figure 3
Figure 3
Verification of prediction accuracy. Percentage of correctly predicted non-native hetero complexes extracted from the Negatome dataset using both FNAT and iRMSD definitions are plotted. Accuracies are plotted as bar diagram for each probability threshold cutoff marked by different colors.
Figure 4
Figure 4
Prediction of probable interaction surface. (A,B) The frequency of the probability threshold scores within the FNAT (A) and iRMSD (B) based top ranked solutions in comparison with same derived from PatchDock based top ranked solutions. (C) Box plot representation of the binding energy of the protein–protein interaction interface (represented via ΔG) of the 12 docked complexes that were commonly predicted by both FNAT and iRMSD models with highest reliability (probability threshold ≥ 0.95) along with the same obtained from the known 3D structures of the heterodimer complexes. ΔGs of the three representative complexes of GAPDH-PGK, GAPDH-ENO1, and GAPDH-TIM are also plotted. (D,F) show the 3D cartoon representations of the complexes where GAPDH is shown in cyan and the PGK1, ENO1, and TIM are shown in purple, orange, and blue, respectively.

Similar articles

Cited by

References

    1. Butland G, Peregrín-Alvarez JM, Li J, Yang W, Yang X, Starostine A, Richards D, Beattie B, Krogan N, Davey M, Parkinson J. Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature. 2005;433(7025):531–537. doi: 10.1038/nature03239. - DOI - PubMed
    1. Kühner S, van Noort V, Betts MJ, Leo-Macias A, Batisse C, Rode M, Yamada T, Maier T, Bader S, Beltran-Alvarez P, Castaño-Diez D. Proteome organization in a genome-reduced bacterium. Science. 2009;326(5957):1235–1240. doi: 10.1126/science.1176343. - DOI - PubMed
    1. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A. A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature. 2000;403(6770):623–627. doi: 10.1038/35001009. - DOI - PubMed
    1. Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP, Punna T. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006;440(7084):637–643. doi: 10.1038/nature04670. - DOI - PubMed
    1. Yu H, Braun P, Yıldırım MA, Lemmens I, Venkatesan K, Sahalie J, Hirozane-Kishikawa T, Gebreab F, Li N, Simonis N, Hao T. High-quality binary protein interaction map of the yeast interactome network. Science. 2008;322(5898):104–110. doi: 10.1126/science.1158684. - DOI - PMC - PubMed

Publication types