. 2021 Apr 15;13(1):30.

doi: 10.1186/s13321-021-00510-6.

Multi-PLI: interpretable multi-task deep learning model for unifying protein-ligand interaction datasets

Fan Hu^#¹, Jiaxin Jiang^#¹, Dongqi Wang¹, Muchun Zhu¹, Peng Yin²

Affiliations

¹ Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
² Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China. peng.yin@siat.ac.cn.

^# Contributed equally.

PMID: 33858485
PMCID: PMC8051026
DOI: 10.1186/s13321-021-00510-6

Multi-PLI: interpretable multi-task deep learning model for unifying protein-ligand interaction datasets

Fan Hu et al. J Cheminform. 2021.

. 2021 Apr 15;13(1):30.

doi: 10.1186/s13321-021-00510-6.

Authors

Fan Hu^#¹, Jiaxin Jiang^#¹, Dongqi Wang¹, Muchun Zhu¹, Peng Yin²

Affiliations

¹ Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
² Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China. peng.yin@siat.ac.cn.

^# Contributed equally.

PMID: 33858485
PMCID: PMC8051026
DOI: 10.1186/s13321-021-00510-6

Abstract

The assessment of protein-ligand interactions is critical at early stage of drug discovery. Computational approaches for efficiently predicting such interactions facilitate drug development. Recently, methods based on deep learning, including structure- and sequence-based models, have achieved impressive performance on several different datasets. However, their application still suffers from a generalizability issue because of insufficient data, especially for structure based models, as well as a heterogeneity problem because of different label measurements and varying proteins across datasets. Here, we present an interpretable multi-task model to evaluate protein-ligand interaction (Multi-PLI). The model can run classification (binding or not) and regression (binding affinity) tasks concurrently by unifying different datasets. The model outperforms traditional docking and machine learning on both binary classification and regression tasks and achieves competitive results compared with some structure-based deep learning methods, even with the same training set size. Furthermore, combined with the proposed occlusion algorithm, the model can predict the important amino acids of proteins that are crucial for binding, thus providing a biological interpretation.

Keywords: Deep learning; Drug discovery; Interpretable; Multi‐task.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
Schematic overview of our method. The proposed model consists of two parts: protein/ligand feature extraction from sequence/SMILES and interaction prediction by shared and task-specific layers. The tasks are defined as: binary classification (protein-ligand binding or not) and regression (protein-ligand binding affinity). The main datasets consist of two regression sets and four classification sets, in which PDBbind and DUD-E have structural data. Four independent sets are used to test the generalizability of the model

**Fig. 2**
Model performance on PDBbind (regression). a Training set, RMSE = 0.75, R = 0.92; b validation set, RMSE = 1.34, R = 0.76; c test set (core2016), RMSE = 1.437, R = 0.75. Coordinates of x and y: pK(_i,d) (−logK_i or −logK_d). Histogram: affinity distributions of real (x) and predicted (y) samples (*pK(*_i,d))

**Fig. 3**
Model performance on DUD-E, *Human* and *C. elegans* (classification). Three-fold cross-validation and random-guess ROC curves plotted in different colors. a DUD-E, mean AUC = 0.959; b Human, mean AUC = 0.948; c *C. elegans*, mean AUC = 0.960

**Fig. 4**
PCA analysis of all datasets used in this study. a PC1 and PC2; b PC1, PC2 and PC3. Randomly samples from each dataset are compared after PCA reduction. Main datasets: DUD-E (red), PDBbind (blue), Human (green), C. elegans (cyan), KIBA (purple) and Davis (yellow). Independent test sets: MUV (peachpuff), CASF2013 (gray), Astex Diverse (peru)

**Fig. 5**
Alignment and visualization of the predicted and actual binding sites of protein sequences. Heat maps of the alignments between the predicted and actual binding sites: a 3rsx; b 2zc9 (the abscissa axis is the length of the protein sequence). Visualization: c 3rsx (the complex of Bace-1 (beta-secretase) and inhibitor 6-(thiophen-3-yl) quinolin-2-amine); 2zc9 (the complex of thrombin and inhibitor d-phenylalanyl-N-(3-chlorobenzyl)-l-prolinamide). The basic protein structures are present in green. The predicted important sites, which are highlighted in red, nearly overlap with the actual binding pockets (yellow) and cover the protein residues that interact with the ligands (light blue)

See this image and copyright information in PMC

References

1. Ma D-L, Chan DS-H, Leung C-H. Drug repositioning by structure-based virtual screening. Chem Soc Rev. 2013;42:2130. doi: 10.1039/c2cs35357a. - DOI - PubMed
1. Koeppen H, Kriegl J, Lessel U et al (2011) Ligand-based virtual screening. virtual screen princ Challenges, pract Guide 61–85. 10.1002/9783527633326.ch3
1. Varnek A, Baskin I. Machine learning methods for property prediction in Chemoinformatics: Quo Vadis ? J Chem Inf Model. 2012;52:1413–1437. doi: 10.1021/ci200409x. - DOI - PubMed
1. Lo Y-C, Rensi SE, Torng W, Altman RB. Machine learning in chemoinformatics and drug discovery. Drug Discov Today. 2018;23:1538–1546. doi: 10.1016/j.drudis.2018.05.010. - DOI - PMC - PubMed
1. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM. 2017;60:84–90. doi: 10.1145/3065386. - DOI

Grants and funding

11801542/National Natural Science Foundation of China

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multi-PLI: interpretable multi-task deep learning model for unifying protein-ligand interaction datasets

Affiliations

Multi-PLI: interpretable multi-task deep learning model for unifying protein-ligand interaction datasets

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous