. 2010 Oct 29:11:537.

doi: 10.1186/1471-2105-11-537.

Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

Alvaro J González¹, Li Liao

Affiliations

PMID: 21034480
PMCID: PMC2989984
DOI: 10.1186/1471-2105-11-537

Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

Alvaro J González et al. BMC Bioinformatics. 2010.

. 2010 Oct 29:11:537.

doi: 10.1186/1471-2105-11-537.

Authors

Alvaro J González¹, Li Liao

Affiliation

¹ Department of Computer and Information Sciences, University of Delaware 421 Smith Hall, Newark, DE 19716, USA.

PMID: 21034480
PMCID: PMC2989984
DOI: 10.1186/1471-2105-11-537

Abstract

Background: Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles.

Results: In this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure.

Conclusions: We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at http://liao.cis.udel.edu/pub/svdsvm. Implemented in Matlab and supported on Linux and MS Windows.

PubMed Disclaimer

Figures

**Figure 1**
**Flow chart of the method**. Panel A shows a pair of interacting domain families, PfamA and PfamB, where for protein pairs across families, 3D structures are known that confirm interaction. The consensus sequence of each alignment as well as the set of interacting positions are used to produce random sequences from each family. Panel B shows the architecture of two ipHMMs, containing interacting states (interface residues) marked as *M_i*, and non-interacting as *M_ni*. Each protein sequence (positive or negative) is aligned to its corresponding ipHMM, and the sufficient statistics of this alignment are used to characterize the sequence by means of Fisher vectors (panel C). In panel D, feature selection is calculated for the entire dataset using *SVD^PO*, and it shows how all the dimensionality reduced vectors can be placed in the same vector space, which leaves them ready for training a support vector machine (SVM). The blue box shows a query sequence pair, where each of the proteins aligns to one of the domain families. Random negative examples are generated again, but now to be used in testing. Panels B, C and D work the same way as in training. In panel E, the SVM is used for classifying test examples. All distances to the hyperplane form a histogram (panel F), where the query sequence, if it is an actual interacting pair, is expected to have a large Z-score.

**Figure 2**
**Architecture of the interaction profile hidden Markov model**. The match states of the classical pHMM are split into non-interacting (*M_ni*) and interacting (*M_i*) match states.

**Figure 3**
**First two SVD components of unconstrained Fisher vectors**. The domain-domain interaction family is AAA-Vps4_C, and the positive example being tested has pdbid 1xwi.

**Figure 4**
**First two SVD components of constrained Fisher vectors**. The domain-domain interaction family is AAA-Vps4_C, and the positive example being tested has pdbid 1xwi.

**Figure 5**
**Complexes that form the FGF-ig family**. Only the molecular structure of the domain interfaces is shown. The first three rows are the training complexes used to learn a model for the entire family. Green balls correspond to the α-carbons of the interacting amino acids in the FGF sequences. Red balls, likewise, correspond to α-carbons of interacting amino acids in the receptors. Yellow and pink ribbons show the secondary structure elements, β-sheets and α-helices respectively. The fourth row shows the secondary structure of one FHF sequence.

**Figure 6**
**Multiple sequence alignments of proteins in the FGF family and in the receptor (ig) family**. Part of the alignment for sequences in the FGF family is shown at the top (note that the whole alignment does not fit in the figure). Interacting positions are marked with green boxes. The homologous FHF1b sequence is also aligned. The bottom alignment shows the receptors, with red boxes marking the interacting residues.

See this image and copyright information in PMC

Cited by

Prediction of contact matrix for protein-protein interaction.
González AJ, Liao L, Wu CH. González AJ, et al. Bioinformatics. 2013 Apr 15;29(8):1018-25. doi: 10.1093/bioinformatics/btt076. Epub 2013 Feb 15. Bioinformatics. 2013. PMID: 23418186 Free PMC article.
A Computational Predictor for Accurate Identification of Tumor Homing Peptides by Integrating Sequential and Deep BiLSTM Features.
Arif R, Kanwal S, Ahmed S, Kabir M. Arif R, et al. Interdiscip Sci. 2024 Jun;16(2):503-518. doi: 10.1007/s12539-024-00628-9. Epub 2024 May 11. Interdiscip Sci. 2024. PMID: 38733473
Inference of protein-protein interaction networks from multiple heterogeneous data.
Huang L, Liao L, Wu CH. Huang L, et al. EURASIP J Bioinform Syst Biol. 2016 Feb 19;2016(1):8. doi: 10.1186/s13637-016-0040-2. eCollection 2016 Dec. EURASIP J Bioinform Syst Biol. 2016. PMID: 26941784 Free PMC article.
Completing sparse and disconnected protein-protein network by deep learning.
Huang L, Liao L, Wu CH. Huang L, et al. BMC Bioinformatics. 2018 Mar 22;19(1):103. doi: 10.1186/s12859-018-2112-7. BMC Bioinformatics. 2018. PMID: 29566671 Free PMC article.
Enhancing interacting residue prediction with integrated contact matrix prediction in protein-protein interaction.
Du T, Liao L, Wu CH. Du T, et al. EURASIP J Bioinform Syst Biol. 2016 Oct 22;2016(1):17. doi: 10.1186/s13637-016-0051-z. eCollection 2016 Dec. EURASIP J Bioinform Syst Biol. 2016. PMID: 27818677 Free PMC article.

See all "Cited by" articles

References

1. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin B, Conover D, Kalbfleisch T, Vijayadamodar G, Yang M, Johnston M, Fields S, Rothber JA. A comprehensive analysis of protein-protein interactions in Saccharimyces cerevisiae. Nature. 2000;403:623–627. doi: 10.1038/35001009. - DOI - PubMed
1. Shen J, Zhang J, Luo X, Zhu W, Yu K, Chen K, Li Y, Jiang H. Predicting protein-protein interactions based only on sequences information. Proc Natl Acad Sci USA. 2007;104:4337–4341. doi: 10.1073/pnas.0607879104. - DOI - PMC - PubMed
1. Bock JR, Gough DA. Predicting protein-protein interactions from primary structure. Bioinformatics. 2001;17:455–460. doi: 10.1093/bioinformatics/17.5.455. - DOI - PubMed
1. Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. Detecting protein function and protein-protein interactions from genome sequences. Science. 1999;285:751–753. doi: 10.1126/science.285.5428.751. - DOI - PubMed
1. Patel T, Liao L. Predicting protein-protein interaction using Fisher scores extracted from domain profiles. Proceedings of IEEE 7th International Symposium for Bioinformatics and Bioengineering (BIBE); Boston, MA. 2007. pp. 946–951. full_text.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

Affiliation

Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Research Materials

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Research Materials