Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Oct 29:11:537.
doi: 10.1186/1471-2105-11-537.

Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

Affiliations

Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

Alvaro J González et al. BMC Bioinformatics. .

Abstract

Background: Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles.

Results: In this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure.

Conclusions: We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at http://liao.cis.udel.edu/pub/svdsvm. Implemented in Matlab and supported on Linux and MS Windows.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flow chart of the method. Panel A shows a pair of interacting domain families, PfamA and PfamB, where for protein pairs across families, 3D structures are known that confirm interaction. The consensus sequence of each alignment as well as the set of interacting positions are used to produce random sequences from each family. Panel B shows the architecture of two ipHMMs, containing interacting states (interface residues) marked as Mi, and non-interacting as Mni . Each protein sequence (positive or negative) is aligned to its corresponding ipHMM, and the sufficient statistics of this alignment are used to characterize the sequence by means of Fisher vectors (panel C). In panel D, feature selection is calculated for the entire dataset using SVDPO, and it shows how all the dimensionality reduced vectors can be placed in the same vector space, which leaves them ready for training a support vector machine (SVM). The blue box shows a query sequence pair, where each of the proteins aligns to one of the domain families. Random negative examples are generated again, but now to be used in testing. Panels B, C and D work the same way as in training. In panel E, the SVM is used for classifying test examples. All distances to the hyperplane form a histogram (panel F), where the query sequence, if it is an actual interacting pair, is expected to have a large Z-score.
Figure 2
Figure 2
Architecture of the interaction profile hidden Markov model. The match states of the classical pHMM are split into non-interacting (Mni) and interacting (Mi) match states.
Figure 3
Figure 3
First two SVD components of unconstrained Fisher vectors. The domain-domain interaction family is AAA-Vps4_C, and the positive example being tested has pdbid 1xwi.
Figure 4
Figure 4
First two SVD components of constrained Fisher vectors. The domain-domain interaction family is AAA-Vps4_C, and the positive example being tested has pdbid 1xwi.
Figure 5
Figure 5
Complexes that form the FGF-ig family. Only the molecular structure of the domain interfaces is shown. The first three rows are the training complexes used to learn a model for the entire family. Green balls correspond to the α-carbons of the interacting amino acids in the FGF sequences. Red balls, likewise, correspond to α-carbons of interacting amino acids in the receptors. Yellow and pink ribbons show the secondary structure elements, β-sheets and α-helices respectively. The fourth row shows the secondary structure of one FHF sequence.
Figure 6
Figure 6
Multiple sequence alignments of proteins in the FGF family and in the receptor (ig) family. Part of the alignment for sequences in the FGF family is shown at the top (note that the whole alignment does not fit in the figure). Interacting positions are marked with green boxes. The homologous FHF1b sequence is also aligned. The bottom alignment shows the receptors, with red boxes marking the interacting residues.

Similar articles

Cited by

References

    1. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin B, Conover D, Kalbfleisch T, Vijayadamodar G, Yang M, Johnston M, Fields S, Rothber JA. A comprehensive analysis of protein-protein interactions in Saccharimyces cerevisiae. Nature. 2000;403:623–627. doi: 10.1038/35001009. - DOI - PubMed
    1. Shen J, Zhang J, Luo X, Zhu W, Yu K, Chen K, Li Y, Jiang H. Predicting protein-protein interactions based only on sequences information. Proc Natl Acad Sci USA. 2007;104:4337–4341. doi: 10.1073/pnas.0607879104. - DOI - PMC - PubMed
    1. Bock JR, Gough DA. Predicting protein-protein interactions from primary structure. Bioinformatics. 2001;17:455–460. doi: 10.1093/bioinformatics/17.5.455. - DOI - PubMed
    1. Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. Detecting protein function and protein-protein interactions from genome sequences. Science. 1999;285:751–753. doi: 10.1126/science.285.5428.751. - DOI - PubMed
    1. Patel T, Liao L. Predicting protein-protein interaction using Fisher scores extracted from domain profiles. Proceedings of IEEE 7th International Symposium for Bioinformatics and Bioengineering (BIBE); Boston, MA. 2007. pp. 946–951. full_text.