DeepCDA: deep cross-domain compound-protein affinity prediction through LSTM and convolutional neural networks
- PMID: 32462178
- DOI: 10.1093/bioinformatics/btaa544
DeepCDA: deep cross-domain compound-protein affinity prediction through LSTM and convolutional neural networks
Abstract
Motivation: An essential part of drug discovery is the accurate prediction of the binding affinity of new compound-protein pairs. Most of the standard computational methods assume that compounds or proteins of the test data are observed during the training phase. However, in real-world situations, the test and training data are sampled from different domains with different distributions. To cope with this challenge, we propose a deep learning-based approach that consists of three steps. In the first step, the training encoder network learns a novel representation of compounds and proteins. To this end, we combine convolutional layers and long-short-term memory layers so that the occurrence patterns of local substructures through a protein and a compound sequence are learned. Also, to encode the interaction strength of the protein and compound substructures, we propose a two-sided attention mechanism. In the second phase, to deal with the different distributions of the training and test domains, a feature encoder network is learned for the test domain by utilizing an adversarial domain adaptation approach. In the third phase, the learned test encoder network is applied to new compound-protein pairs to predict their binding affinity.
Results: To evaluate the proposed approach, we applied it to KIBA, Davis and BindingDB datasets. The results show that the proposed method learns a more reliable model for the test domain in more challenging situations.
Availability and implementation: https://github.com/LBBSoft/DeepCDA.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources