. 2010 Jul 28:11:402.

doi: 10.1186/1471-2105-11-402.

Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information

Peng Chen¹, Jinyan Li

Affiliations

PMID: 20667087
PMCID: PMC2921408
DOI: 10.1186/1471-2105-11-402

Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information

Peng Chen et al. BMC Bioinformatics. 2010.

. 2010 Jul 28:11:402.

doi: 10.1186/1471-2105-11-402.

Authors

Peng Chen¹, Jinyan Li

Affiliation

¹ Bioinformatics Research Center, School of Computer Engineering, Nanyang Technological University, 639798 Singapore.

PMID: 20667087
PMCID: PMC2921408
DOI: 10.1186/1471-2105-11-402

Abstract

Background: Protein-protein interactions play essential roles in protein function determination and drug design. Numerous methods have been proposed to recognize their interaction sites, however, only a small proportion of protein complexes have been successfully resolved due to the high cost. Therefore, it is important to improve the performance for predicting protein interaction sites based on primary sequence alone.

Results: We propose a new idea to construct an integrative profile for each residue in a protein by combining its hydrophobic and evolutionary information. A support vector machine (SVM) ensemble is then developed, where SVMs train on different pairs of positive (interface sites) and negative (non-interface sites) subsets. The subsets having roughly the same sizes are grouped in the order of accessible surface area change before and after complexation. A self-organizing map (SOM) technique is applied to group similar input vectors to make more accurate the identification of interface residues. An ensemble of ten-SVMs achieves an MCC improvement by around 8% and F1 improvement by around 9% over that of three-SVMs. As expected, SVM ensembles constantly perform better than individual SVMs. In addition, the model by the integrative profiles outperforms that based on the sequence profile or the hydropathy scale alone. As our method uses a small number of features to encode the input vectors, our model is simpler, faster and more accurate than the existing methods.

Conclusions: The integrative profile by combining hydrophobic and evolutionary information contributes most to the protein-protein interaction prediction. Results show that evolutionary context of residue with respect to hydrophobicity makes better the identification of protein interface residues. In addition, the ensemble of SVM classifiers improves the prediction performance.

Availability: Datasets and software are available at http://mail.ustc.edu.cn/~bigeagle/BMCBioinfo2010/index.htm.

PubMed Disclaimer

Figures

**Figure 1**
**Performance by our model without SOM**. The figure illustrates the performance curves of sensitivity-precision and sensitivity-MCC after combining the ten-SVMs. Numbers in the legend stand for SVM with different thresholds. Note that curves with the same color correspond to the model with the same threshold.

**Figure 2**
**Performance by our model when using 5 × 5 SOM**. The figure illustrates performance curves of sensitivity-precision and sensitivity-MCC after combining the ten-SVMs. Numbers in the legend stand for SVM with different thresholds. Note that curves with the same color correspond to the model with the same threshold.

**Figure 3**
**Comparison between the three profiles on the complex of Bacillus pasteurii urease with acetohydroxamate anion(PDB id:** 4UBP, chain A). (a) Prediction results for hydropathy scale; (b) Results for sequence profile; (c) Results for the integrative profile. True prediction interface residues are in red, false predicted non-interface residues are shown in green, false predicted interface residues are in blue, while other ones are in white.

**Figure 4**
**Performance improvement by the classifier ensemble on the complex of Bacillus pasteurii urease with acetohydroxamate anion(PDB id:** 4UBP, chain A). (a)~(j) Prediction results for the ten sub-classifiers; (k) Combined classifier with threshold 5. True prediction interface residues are in red, false predicted non-interface residues are shown in green, false predicted interface residues are in blue, while other ones are in white.

**Figure 5**
**Comparison with a method in literature** [31] **and a random predictor**. The red line is for our model and the green line is for the prediction of a random predictor, while the blue line and the blue broken line are for the Sikic's method based on sequence alone and based on both sequence and 3D structure, respectively.

**Figure 6**
**Visualization of the overall orientation and prediction results on CCD-IBD complex PDB:**2b4j. (a) The overall orientation of CCD-IBD complex; (b) Protein-protein interaction predictions of CCD-IBD complex. The orientation of the complex is illustrated by a smooth spline between consecutive alpha carbon positions. Left graph denotes the natural orientation, while the right one illustrates the protein-protein interaction prediction of the complex. In the right graph, blue sphere stands for TP residue, bluetint one stands for FP residue, and gold sphere demonstrates FN residue. All other residues (not shown as colored spheres) are true negatives (TN). Note that the orientation of the complex in the right graph is varied a little to clearly show the predictions of protein interface residues. Additionally each sphere represents an alpha-carbon atom of each residue. We used RasTop http://www.geneinfinity.org/rastop/ software to display the structure of this complex.

**Figure 7**
**Flowchart of generating residue profiles**. Each row of the sequence profile corresponds to a residue in the protein, while each column in the sequence profile or the KD hydropathy scale corresponds to each amino acid type.

**Figure 8**
**SVM ensemble for identifying protein-protein interface residues**.

See this image and copyright information in PMC

Cited by

Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System.
Jiang J, Wang N, Chen P, Zheng C, Wang B. Jiang J, et al. Int J Mol Sci. 2017 Jul 18;18(7):1543. doi: 10.3390/ijms18071543. Int J Mol Sci. 2017. PMID: 28718782 Free PMC article.
Prediction of heme binding residues from protein sequences with integrative sequence profiles.
Xiong Y, Liu J, Zhang W, Zeng T. Xiong Y, et al. Proteome Sci. 2012 Jun 21;10 Suppl 1(Suppl 1):S20. doi: 10.1186/1477-5956-10-S1-S20. Proteome Sci. 2012. PMID: 22759579 Free PMC article.
HomPPI: a class of sequence homology based protein-protein interface prediction methods.
Xue LC, Dobbs D, Honavar V. Xue LC, et al. BMC Bioinformatics. 2011 Jun 17;12:244. doi: 10.1186/1471-2105-12-244. BMC Bioinformatics. 2011. PMID: 21682895 Free PMC article.
Unravelling the human taste receptor interactome: machine learning and molecular modelling insights into protein-protein interactions.
Zaverdas H, Stojceski F, Romero-Zaliz R, Androutsos L, Makrygiannis P, Pallante L, Martos V, Grasso G, Deriu MA, Theofilatos K, Mavroudi S. Zaverdas H, et al. NPJ Sci Food. 2025 Jul 1;9(1):113. doi: 10.1038/s41538-025-00478-9. NPJ Sci Food. 2025. PMID: 40595706 Free PMC article.
DrugECs: An Ensemble System with Feature Subspaces for Accurate Drug-Target Interaction Prediction.
Jiang J, Wang N, Chen P, Zhang J, Wang B. Jiang J, et al. Biomed Res Int. 2017;2017:6340316. doi: 10.1155/2017/6340316. Epub 2017 Jul 4. Biomed Res Int. 2017. PMID: 28744468 Free PMC article.

See all "Cited by" articles

References

1. Alberts BD, Lewis J, Raff M, Roberts K, Watson JD. Molecular Biology of the Cell. 2. New York: Garland; 1989.
1. Bollenbach TJ, Nowak T. Kinetic Linked-Function Analysis of the Multiligand Interactions on Mg2+-Activated Yeast Pyruvate Kinase. Biochemistry. 2001;40(43):13097–13106. doi: 10.1021/bi010126o. - DOI - PubMed
1. Chelliah V, Chen L, Blundell TL, Lovell SC. Distinguishing structural and functional restraints in evolution in order to identify interaction sites. J Mol Biol. 2004;342:1487–1504. doi: 10.1016/j.jmb.2004.08.022. - DOI - PubMed
1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. - DOI - PMC - PubMed
1. Uni-Prot-Consortium. The universal protein resource (UniProt) Nucleic Acids Res. 2008;36:D190–D195. doi: 10.1093/nar/gkn141. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information

Affiliation

Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Research Materials

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Research Materials