Effectively Identifying Compound-Protein Interactions by Learning from Positive and Unlabeled Examples
- PMID: 28113437
- DOI: 10.1109/TCBB.2016.2570211
Effectively Identifying Compound-Protein Interactions by Learning from Positive and Unlabeled Examples
Abstract
Prediction of compound-protein interactions (CPIs) is to find new compound-protein pairs where a protein is targeted by at least a compound, which is a crucial step in new drug design. Currently, a number of machine learning based methods have been developed to predict new CPIs in the literature. However, as there is not yet any publicly available set of validated negative CPIs, most existing machine learning based approaches use the unknown interactions (not validated CPIs) selected randomly as the negative examples to train classifiers for predicting new CPIs. Obviously, this is not quite reasonable and unavoidably impacts the CPI prediction performance. In this paper, we simply take the unknown CPIs as unlabeled examples, and propose a new method called PUCPI (the abbreviation of PU learning for Compound-Protein Interaction identification) that employs biased-SVM (Support Vector Machine) to predict CPIs using only positive and unlabeled examples. PU learning is a class of learning methods that leans from positive and unlabeled (PU) samples. To the best of our knowledge, this is the first work that identifies CPIs using only positive and unlabeled examples. We first collect known CPIs as positive examples and then randomly select compound-protein pairs not in the positive set as unlabeled examples. For each CPI/compound-protein pair, we extract protein domains as protein features and compound substructures as chemical features, then take the tensor product of the corresponding compound features and protein features as the feature vector of the CPI/compound-protein pair. After that, biased-SVM is employed to train classifiers on different datasets of CPIs and compound-protein pairs. Experiments over various datasets show that our method outperforms six typical classifiers, including random forest, L1- and L2-regularized logistic regression, naive Bayes, SVM and k-nearest neighbor (kNN), and three types of existing CPI prediction models. More information can be found at http://admis.fudan.edu.cn/projects/pucpi.html.
Similar articles
-
Computationally predicting protein-RNA interactions using only positive and unlabeled examples.J Bioinform Comput Biol. 2015 Jun;13(3):1541005. doi: 10.1142/S021972001541005X. Epub 2015 Feb 8. J Bioinform Comput Biol. 2015. PMID: 25790785
-
Improving compound-protein interaction prediction by building up highly credible negative samples.Bioinformatics. 2015 Jun 15;31(12):i221-9. doi: 10.1093/bioinformatics/btv256. Bioinformatics. 2015. PMID: 26072486 Free PMC article.
-
Boosting compound-protein interaction prediction by deep learning.Methods. 2016 Nov 1;110:64-72. doi: 10.1016/j.ymeth.2016.06.024. Epub 2016 Jul 1. Methods. 2016. PMID: 27378654
-
Application of Machine Learning Approaches for Protein-protein Interactions Prediction.Med Chem. 2017;13(6):506-514. doi: 10.2174/1573406413666170522150940. Med Chem. 2017. PMID: 28530547 Review.
-
Application of Machine Learning Approaches for the Design and Study of Anticancer Drugs.Curr Drug Targets. 2019;20(5):488-500. doi: 10.2174/1389450119666180809122244. Curr Drug Targets. 2019. PMID: 30091413 Review.
Cited by
-
DeepAffinity: interpretable deep learning of compound-protein affinity through unified recurrent and convolutional neural networks.Bioinformatics. 2019 Sep 15;35(18):3329-3338. doi: 10.1093/bioinformatics/btz111. Bioinformatics. 2019. PMID: 30768156 Free PMC article.
-
SSGraphCPI: A Novel Model for Predicting Compound-Protein Interactions Based on Deep Learning.Int J Mol Sci. 2022 Mar 29;23(7):3780. doi: 10.3390/ijms23073780. Int J Mol Sci. 2022. PMID: 35409140 Free PMC article.
-
BindingSite-AugmentedDTA: enabling a next-generation pipeline for interpretable prediction models in drug repurposing.Brief Bioinform. 2023 May 19;24(3):bbad136. doi: 10.1093/bib/bbad136. Brief Bioinform. 2023. PMID: 37096593 Free PMC article.
-
MSA-Regularized Protein Sequence Transformer toward Predicting Genome-Wide Chemical-Protein Interactions: Application to GPCRome Deorphanization.J Chem Inf Model. 2021 Apr 26;61(4):1570-1582. doi: 10.1021/acs.jcim.0c01285. Epub 2021 Mar 23. J Chem Inf Model. 2021. PMID: 33757283 Free PMC article.
-
SS-GNN: A Simple-Structured Graph Neural Network for Affinity Prediction.ACS Omega. 2023 Jun 15;8(25):22496-22507. doi: 10.1021/acsomega.3c00085. eCollection 2023 Jun 27. ACS Omega. 2023. PMID: 37396234 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical