Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 15;13(1):30.
doi: 10.1186/s13321-021-00510-6.

Multi-PLI: interpretable multi-task deep learning model for unifying protein-ligand interaction datasets

Affiliations

Multi-PLI: interpretable multi-task deep learning model for unifying protein-ligand interaction datasets

Fan Hu et al. J Cheminform. .

Abstract

The assessment of protein-ligand interactions is critical at early stage of drug discovery. Computational approaches for efficiently predicting such interactions facilitate drug development. Recently, methods based on deep learning, including structure- and sequence-based models, have achieved impressive performance on several different datasets. However, their application still suffers from a generalizability issue because of insufficient data, especially for structure based models, as well as a heterogeneity problem because of different label measurements and varying proteins across datasets. Here, we present an interpretable multi-task model to evaluate protein-ligand interaction (Multi-PLI). The model can run classification (binding or not) and regression (binding affinity) tasks concurrently by unifying different datasets. The model outperforms traditional docking and machine learning on both binary classification and regression tasks and achieves competitive results compared with some structure-based deep learning methods, even with the same training set size. Furthermore, combined with the proposed occlusion algorithm, the model can predict the important amino acids of proteins that are crucial for binding, thus providing a biological interpretation.

Keywords: Deep learning; Drug discovery; Interpretable; Multi‐task.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Schematic overview of our method. The proposed model consists of two parts: protein/ligand feature extraction from sequence/SMILES and interaction prediction by shared and task-specific layers. The tasks are defined as: binary classification (protein-ligand binding or not) and regression (protein-ligand binding affinity). The main datasets consist of two regression sets and four classification sets, in which PDBbind and DUD-E have structural data. Four independent sets are used to test the generalizability of the model
Fig. 2
Fig. 2
Model performance on PDBbind (regression). a Training set, RMSE = 0.75, R = 0.92; b validation set, RMSE = 1.34, R = 0.76; c test set (core2016), RMSE = 1.437, R = 0.75. Coordinates of x and y: pK(i,d) (−logKi or −logKd). Histogram: affinity distributions of real (x) and predicted (y) samples (pK(i,d))
Fig. 3
Fig. 3
Model performance on DUD-E, Human and C. elegans (classification). Three-fold cross-validation and random-guess ROC curves plotted in different colors. a DUD-E, mean AUC = 0.959; b Human, mean AUC = 0.948; c C. elegans, mean AUC = 0.960
Fig. 4
Fig. 4
PCA analysis of all datasets used in this study. a PC1 and PC2; b PC1, PC2 and PC3. Randomly samples from each dataset are compared after PCA reduction. Main datasets: DUD-E (red), PDBbind (blue), Human (green), C. elegans (cyan), KIBA (purple) and Davis (yellow). Independent test sets: MUV (peachpuff), CASF2013 (gray), Astex Diverse (peru)
Fig. 5
Fig. 5
Alignment and visualization of the predicted and actual binding sites of protein sequences. Heat maps of the alignments between the predicted and actual binding sites: a 3rsx; b 2zc9 (the abscissa axis is the length of the protein sequence). Visualization: c 3rsx (the complex of Bace-1 (beta-secretase) and inhibitor 6-(thiophen-3-yl) quinolin-2-amine);  2zc9 (the complex of thrombin and inhibitor d-phenylalanyl-N-(3-chlorobenzyl)-l-prolinamide). The basic protein structures are present in green. The predicted important sites, which are highlighted in red, nearly overlap with the actual binding pockets (yellow) and cover the protein residues that interact with the ligands (light blue)

References

    1. Ma D-L, Chan DS-H, Leung C-H. Drug repositioning by structure-based virtual screening. Chem Soc Rev. 2013;42:2130. doi: 10.1039/c2cs35357a. - DOI - PubMed
    1. Koeppen H, Kriegl J, Lessel U et al (2011) Ligand-based virtual screening. virtual screen princ Challenges, pract Guide 61–85. 10.1002/9783527633326.ch3
    1. Varnek A, Baskin I. Machine learning methods for property prediction in Chemoinformatics: Quo Vadis ? J Chem Inf Model. 2012;52:1413–1437. doi: 10.1021/ci200409x. - DOI - PubMed
    1. Lo Y-C, Rensi SE, Torng W, Altman RB. Machine learning in chemoinformatics and drug discovery. Drug Discov Today. 2018;23:1538–1546. doi: 10.1016/j.drudis.2018.05.010. - DOI - PMC - PubMed
    1. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM. 2017;60:84–90. doi: 10.1145/3065386. - DOI