Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jul 28:11:402.
doi: 10.1186/1471-2105-11-402.

Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information

Affiliations

Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information

Peng Chen et al. BMC Bioinformatics. .

Abstract

Background: Protein-protein interactions play essential roles in protein function determination and drug design. Numerous methods have been proposed to recognize their interaction sites, however, only a small proportion of protein complexes have been successfully resolved due to the high cost. Therefore, it is important to improve the performance for predicting protein interaction sites based on primary sequence alone.

Results: We propose a new idea to construct an integrative profile for each residue in a protein by combining its hydrophobic and evolutionary information. A support vector machine (SVM) ensemble is then developed, where SVMs train on different pairs of positive (interface sites) and negative (non-interface sites) subsets. The subsets having roughly the same sizes are grouped in the order of accessible surface area change before and after complexation. A self-organizing map (SOM) technique is applied to group similar input vectors to make more accurate the identification of interface residues. An ensemble of ten-SVMs achieves an MCC improvement by around 8% and F1 improvement by around 9% over that of three-SVMs. As expected, SVM ensembles constantly perform better than individual SVMs. In addition, the model by the integrative profiles outperforms that based on the sequence profile or the hydropathy scale alone. As our method uses a small number of features to encode the input vectors, our model is simpler, faster and more accurate than the existing methods.

Conclusions: The integrative profile by combining hydrophobic and evolutionary information contributes most to the protein-protein interaction prediction. Results show that evolutionary context of residue with respect to hydrophobicity makes better the identification of protein interface residues. In addition, the ensemble of SVM classifiers improves the prediction performance.

Availability: Datasets and software are available at http://mail.ustc.edu.cn/~bigeagle/BMCBioinfo2010/index.htm.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Performance by our model without SOM. The figure illustrates the performance curves of sensitivity-precision and sensitivity-MCC after combining the ten-SVMs. Numbers in the legend stand for SVM with different thresholds. Note that curves with the same color correspond to the model with the same threshold.
Figure 2
Figure 2
Performance by our model when using 5 × 5 SOM. The figure illustrates performance curves of sensitivity-precision and sensitivity-MCC after combining the ten-SVMs. Numbers in the legend stand for SVM with different thresholds. Note that curves with the same color correspond to the model with the same threshold.
Figure 3
Figure 3
Comparison between the three profiles on the complex of Bacillus pasteurii urease with acetohydroxamate anion(PDB id: 4UBP, chain A). (a) Prediction results for hydropathy scale; (b) Results for sequence profile; (c) Results for the integrative profile. True prediction interface residues are in red, false predicted non-interface residues are shown in green, false predicted interface residues are in blue, while other ones are in white.
Figure 4
Figure 4
Performance improvement by the classifier ensemble on the complex of Bacillus pasteurii urease with acetohydroxamate anion(PDB id: 4UBP, chain A). (a)~(j) Prediction results for the ten sub-classifiers; (k) Combined classifier with threshold 5. True prediction interface residues are in red, false predicted non-interface residues are shown in green, false predicted interface residues are in blue, while other ones are in white.
Figure 5
Figure 5
Comparison with a method in literature [31] and a random predictor. The red line is for our model and the green line is for the prediction of a random predictor, while the blue line and the blue broken line are for the Sikic's method based on sequence alone and based on both sequence and 3D structure, respectively.
Figure 6
Figure 6
Visualization of the overall orientation and prediction results on CCD-IBD complex PDB:2b4j. (a) The overall orientation of CCD-IBD complex; (b) Protein-protein interaction predictions of CCD-IBD complex. The orientation of the complex is illustrated by a smooth spline between consecutive alpha carbon positions. Left graph denotes the natural orientation, while the right one illustrates the protein-protein interaction prediction of the complex. In the right graph, blue sphere stands for TP residue, bluetint one stands for FP residue, and gold sphere demonstrates FN residue. All other residues (not shown as colored spheres) are true negatives (TN). Note that the orientation of the complex in the right graph is varied a little to clearly show the predictions of protein interface residues. Additionally each sphere represents an alpha-carbon atom of each residue. We used RasTop http://www.geneinfinity.org/rastop/ software to display the structure of this complex.
Figure 7
Figure 7
Flowchart of generating residue profiles. Each row of the sequence profile corresponds to a residue in the protein, while each column in the sequence profile or the KD hydropathy scale corresponds to each amino acid type.
Figure 8
Figure 8
SVM ensemble for identifying protein-protein interface residues.

Similar articles

Cited by

References

    1. Alberts BD, Lewis J, Raff M, Roberts K, Watson JD. Molecular Biology of the Cell. 2. New York: Garland; 1989.
    1. Bollenbach TJ, Nowak T. Kinetic Linked-Function Analysis of the Multiligand Interactions on Mg2+-Activated Yeast Pyruvate Kinase. Biochemistry. 2001;40(43):13097–13106. doi: 10.1021/bi010126o. - DOI - PubMed
    1. Chelliah V, Chen L, Blundell TL, Lovell SC. Distinguishing structural and functional restraints in evolution in order to identify interaction sites. J Mol Biol. 2004;342:1487–1504. doi: 10.1016/j.jmb.2004.08.022. - DOI - PubMed
    1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. - DOI - PMC - PubMed
    1. Uni-Prot-Consortium. The universal protein resource (UniProt) Nucleic Acids Res. 2008;36:D190–D195. doi: 10.1093/nar/gkn141. - DOI - PMC - PubMed

Publication types