Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites
- PMID: 20529890
- DOI: 10.1093/bioinformatics/btq302
Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites
Abstract
Motivation: The limited availability of protein structures often restricts the functional annotation of proteins and the identification of their protein-protein interaction sites. Computational methods to identify interaction sites from protein sequences alone are, therefore, required for unraveling the functions of many proteins. This article describes a new method (PSIVER) to predict interaction sites, i.e. residues binding to other proteins, in protein sequences. Only sequence features (position-specific scoring matrix and predicted accessibility) are used for training a Naïve Bayes classifier (NBC), and conditional probabilities of each sequence feature are estimated using a kernel density estimation method (KDE).
Results: The leave-one out cross-validation of PSIVER achieved a Matthews correlation coefficient (MCC) of 0.151, an F-measure of 35.3%, a precision of 30.6% and a recall of 41.6% on a non-redundant set of 186 protein sequences extracted from 105 heterodimers in the Protein Data Bank (consisting of 36 219 residues, of which 15.2% were known interface residues). Even though the dataset used for training was highly imbalanced, a randomization test demonstrated that the proposed method managed to avoid overfitting. PSIVER was also tested on 72 sequences not used in training (consisting of 18 140 residues, of which 10.6% were known interface residues), and achieved an MCC of 0.135, an F-measure of 31.5%, a precision of 25.0% and a recall of 46.5%, outperforming other publicly available servers tested on the same dataset. PSIVER enables experimental biologists to identify potential interface residues in unknown proteins from sequence information alone, and to mutate those residues selectively in order to unravel protein functions.
Availability: Freely available on the web at http://tardis.nibio.go.jp/PSIVER/
Similar articles
-
Sequence-based prediction of protein interaction sites with an integrative method.Bioinformatics. 2009 Mar 1;25(5):585-91. doi: 10.1093/bioinformatics/btp039. Epub 2009 Jan 19. Bioinformatics. 2009. PMID: 19153136
-
Prediction of protein-RNA binding sites by a random forest method with combined features.Bioinformatics. 2010 Jul 1;26(13):1616-22. doi: 10.1093/bioinformatics/btq253. Epub 2010 May 18. Bioinformatics. 2010. PMID: 20483814
-
Identification of RNA-binding sites in proteins by integrating various sequence information.Amino Acids. 2011 Jan;40(1):239-48. doi: 10.1007/s00726-010-0639-7. Epub 2010 Jun 12. Amino Acids. 2011. PMID: 20549269
-
PRINTR: prediction of RNA binding sites in proteins using SVM and profiles.Amino Acids. 2008 Aug;35(2):295-302. doi: 10.1007/s00726-007-0634-9. Epub 2008 Jan 31. Amino Acids. 2008. PMID: 18235992
-
Computational Prediction of Carbohydrate-Binding Proteins and Binding Sites.Curr Protoc Protein Sci. 2018 Nov;94(1):e75. doi: 10.1002/cpps.75. Epub 2018 Aug 14. Curr Protoc Protein Sci. 2018. PMID: 30106511 Review.
Cited by
-
Overview of methods for characterization and visualization of a protein-protein interaction network in a multi-omics integration context.Front Mol Biosci. 2022 Sep 8;9:962799. doi: 10.3389/fmolb.2022.962799. eCollection 2022. Front Mol Biosci. 2022. PMID: 36158572 Free PMC article. Review.
-
Cotton leaf curl Multan virus differentially regulates innate antiviral immunity of whitefly (Bemisia tabaci) vector to promote cryptic species-dependent virus acquisition.Front Plant Sci. 2022 Nov 14;13:1040547. doi: 10.3389/fpls.2022.1040547. eCollection 2022. Front Plant Sci. 2022. PMID: 36452094 Free PMC article.
-
A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond.Brief Bioinform. 2024 Mar 27;25(3):bbae162. doi: 10.1093/bib/bbae162. Brief Bioinform. 2024. PMID: 38739759 Free PMC article. Review.
-
Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine Learning.Front Genet. 2020 Jul 21;11:655. doi: 10.3389/fgene.2020.00655. eCollection 2020. Front Genet. 2020. PMID: 32849764 Free PMC article.
-
Predicting rifampicin resistance mutations in bacterial RNA polymerase subunit beta based on majority consensus.BMC Bioinformatics. 2021 Apr 22;22(1):210. doi: 10.1186/s12859-021-04137-0. BMC Bioinformatics. 2021. PMID: 33888055 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources