DEEPScreen: high performance drug-target interaction prediction with convolutional neural networks using 2-D structural compound representations

Ahmet Sureyya Rifaioglu^{1

2

3}, Esra Nalbat³, Volkan Atalay^{1

3}, Maria Jesus Martin⁴, Rengul Cetin-Atalay^{3

5}, Tunca Doğan^{6

7}

Affiliations

¹ Department of Computer Engineering , METU , Ankara , 06800 , Turkey . Email: vatalay@metu.edu.tr ; Tel: +903122105576.
² Department of Computer Engineering , İskenderun Technical University , Hatay , 31200 , Turkey.
³ KanSiL , Department of Health Informatics , Graduate School of Informatics , METU , Ankara , 06800 , Turkey.
⁴ European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI) , Hinxton , Cambridge , CB10 1SD , UK.
⁵ Section of Pulmonary and Critical Care Medicine , The University of Chicago , Chicago , IL 60637 , USA.
⁶ Department of Computer Engineering , Hacettepe University , Ankara , 06800 , Turkey . Email: tuncadogan@hacettepe.edu.tr ; Tel: +903122977193/117.
⁷ Institute of Informatics , Hacettepe University , Ankara , 06800 , Turkey.

PMID: 33209251
PMCID: PMC7643205
DOI: 10.1039/c9sc03414e

DEEPScreen: high performance drug-target interaction prediction with convolutional neural networks using 2-D structural compound representations

Ahmet Sureyya Rifaioglu et al. Chem Sci. 2020.

. 2020 Jan 8;11(9):2531-2557.

doi: 10.1039/c9sc03414e. eCollection 2020 Mar 7.

Authors

Ahmet Sureyya Rifaioglu^{1

2

3}, Esra Nalbat³, Volkan Atalay^{1

3}, Maria Jesus Martin⁴, Rengul Cetin-Atalay^{3

5}, Tunca Doğan^{6

7}

Affiliations

¹ Department of Computer Engineering , METU , Ankara , 06800 , Turkey . Email: vatalay@metu.edu.tr ; Tel: +903122105576.
² Department of Computer Engineering , İskenderun Technical University , Hatay , 31200 , Turkey.
³ KanSiL , Department of Health Informatics , Graduate School of Informatics , METU , Ankara , 06800 , Turkey.
⁴ European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI) , Hinxton , Cambridge , CB10 1SD , UK.
⁵ Section of Pulmonary and Critical Care Medicine , The University of Chicago , Chicago , IL 60637 , USA.
⁶ Department of Computer Engineering , Hacettepe University , Ankara , 06800 , Turkey . Email: tuncadogan@hacettepe.edu.tr ; Tel: +903122977193/117.
⁷ Institute of Informatics , Hacettepe University , Ankara , 06800 , Turkey.

PMID: 33209251
PMCID: PMC7643205
DOI: 10.1039/c9sc03414e

Abstract

The identification of physical interactions between drug candidate compounds and target biomolecules is an important process in drug discovery. Since conventional screening procedures are expensive and time consuming, computational approaches are employed to provide aid by automatically predicting novel drug-target interactions (DTIs). In this study, we propose a large-scale DTI prediction system, DEEPScreen, for early stage drug discovery, using deep convolutional neural networks. One of the main advantages of DEEPScreen is employing readily available 2-D structural representations of compounds at the input level instead of conventional descriptors that display limited performance. DEEPScreen learns complex features inherently from the 2-D representations, thus producing highly accurate predictions. The DEEPScreen system was trained for 704 target proteins (using curated bioactivity data) and finalized with rigorous hyper-parameter optimization tests. We compared the performance of DEEPScreen against the state-of-the-art on multiple benchmark datasets to indicate the effectiveness of the proposed approach and verified selected novel predictions through molecular docking analysis and literature-based validation. Finally, JAK proteins that were predicted by DEEPScreen as new targets of a well-known drug cladribine were experimentally demonstrated in vitro on cancer cells through STAT3 phosphorylation, which is the downstream effector protein. The DEEPScreen system can be exploited in the fields of drug discovery and repurposing for in silico screening of the chemogenomic space, to provide novel DTIs which can be experimentally pursued. The source code, trained "ready-to-use" prediction models, all datasets and the results of this study are available at ; https://github.com/cansyl/DEEPscreen.

This journal is © The Royal Society of Chemistry 2020.

PubMed Disclaimer

Figures

Fig. 1. Illustration of the deep convolutional neural network structure of DEEPScreen, where the sole input is the 2-D structural images of the drugs and drug candidate compounds (generated from the SMILES representations as a data pre-processing step). Each target protein has an individual prediction model with specifically optimized hyper-parameters (please refer to the Methods section). For each query compound, the model produces a binary output either as active or inactive, considering the interaction with the corresponding target.

Fig. 2. Data filtering and processing steps to create the training dataset of each target protein model. Predictive models were trained for 704 target proteins, each of which has at least 100 known active ligands in the ChEMBL database.

Fig. 3. (a) Overall predictive performance comparison of DEEPScreen *vs.* state-of-the-art classifiers. Each point in the horizontal axis represents a target protein model: the vertical axis represents performance in the MCC, accuracy and F1-score, respectively. For each classifier, targets are ranked in a descending performance order. Average performance values (mean and median) are given inside the plots. (b) Target-based maximum predictive performance (MCC-based) heatmap for DEEPScreen and conventional classifiers (columns) (LR: logistic regression, RF: random forest, SVM: support vector machine; ECFP: fingerprint-based models, and image: 2-D structural representation-based models). For each target protein (row), classifier performances are shown in shades of red (*i.e.*, high performance) and blue (*i.e.*, low performance) colours according to Z-scores (Z-scores are calculated individually for each target). Rows are arranged in blocks according to target families. The height of a block is proportional to the number of targets in its corresponding family (enzymes: 374, GPCRs: 212, ion channels: 33, nuclear receptors: 27, and others: 58). Within each block, targets are arranged according to descending performance from top down with respect to DEEPScreen. Grey colour signifies the cases, where learning was not possible. (c) MCC performance box plots in the 10-fold cross-validation experiment, to compare DEEPScreen with the state-of-the-art DTI predictors.

Fig. 4. Predictive performance evaluation and comparison of DEEPScreen against the state-of-the-art DTI prediction approaches, on scaffold-split benchmarks: (a) bar plots of MCC values on representative targets dataset; (b) bar plots of MCC values on the MUV dataset.

Fig. 5. JAK downstream effector alteration in the presence of cladribine. (a) Live cell images for cladribine treated cells before (0H) and after 72 hours of treatment (72H). (b) Flow cytometry histogram of the phosphorylated STAT3 protein complex in Mahlavu, Huh7 and HepG2 cells. (c) STAT3 protein complex levels in Mahlavu, Huh7 and HepG2 cells detected and assessed with Phospho-Tyr705 antibodies. (d) Cell cycle analysis: (e) apoptotic cells characterized by annexin V assay. (f) Changes in protein expression levels of STAT3 related to cladribine treatment. Bar graphs represent normalized STAT3 and phospho-STAT3 compared to calnexin. DMSO was used as the vehicle control.

Fig. 6. A case study for the evaluation of DEEPScreen predictions. (a) 3-D structure of the human renin protein (obtained from PDB id: 2REN), together with the 2-D representations of selected active (connected by green arrows) and inactive (connected by red arrows) ligand predictions in the predictive performance tests (the true experimental screening assay activities – IC₅₀ – are shown under the corresponding images). Also, 2-D images of selected truly novel predicted inhibitors of renin (*i.e.*, cortivazol, lasofoxifene and sulprostone) are displayed (connected by blue arrows) together with the estimated docking K_d values. (b) Renin–aliskiren crystal structure (PDB id: ; 2V0Z, aliskiren is displayed in red color) and the best poses in the automated molecular docking of DEEPScreen predicted inhibitors of renin: cortivazol (blue), lasofoxifene (green) and sulprostone (violet), to the structurally known binding site of renin (gold color), displaying hydrogen bonds with light blue lines. The docking process produced sufficiently low binding free energies for the novel inhibitors, around the levels of the structurally characterized ligands of renin, aliskiren and remikiren, indicating high potency.

See this image and copyright information in PMC

Cited by

Generative artificial intelligence in drug discovery: basic framework, recent advances, challenges, and opportunities.
Gangwal A, Ansari A, Ahmad I, Azad AK, Kumarasamy V, Subramaniyan V, Wong LS. Gangwal A, et al. Front Pharmacol. 2024 Feb 7;15:1331062. doi: 10.3389/fphar.2024.1331062. eCollection 2024. Front Pharmacol. 2024. PMID: 38384298 Free PMC article. Review.
Discovery of novel dual adenosine A1/A2A receptor antagonists using deep learning, pharmacophore modeling and molecular docking.
Wang M, Hou S, Wei Y, Li D, Lin J. Wang M, et al. PLoS Comput Biol. 2021 Mar 19;17(3):e1008821. doi: 10.1371/journal.pcbi.1008821. eCollection 2021 Mar. PLoS Comput Biol. 2021. PMID: 33739970 Free PMC article.
The changing scenario of drug discovery using AI to deep learning: Recent advancement, success stories, collaborations, and challenges.
Chakraborty C, Bhattacharya M, Lee SS, Wen ZH, Lo YH. Chakraborty C, et al. Mol Ther Nucleic Acids. 2024 Aug 8;35(3):102295. doi: 10.1016/j.omtn.2024.102295. eCollection 2024 Sep 10. Mol Ther Nucleic Acids. 2024. PMID: 39257717 Free PMC article. Review.
A review of deep learning methods for ligand based drug virtual screening.
Wu H, Liu J, Zhang R, Lu Y, Cui G, Cui Z, Ding Y. Wu H, et al. Fundam Res. 2024 Mar 11;4(4):715-737. doi: 10.1016/j.fmre.2024.02.011. eCollection 2024 Jul. Fundam Res. 2024. PMID: 39156568 Free PMC article. Review.
Graph regularized non-negative matrix factorization with prior knowledge consistency constraint for drug-target interactions prediction.
Zhang J, Xie M. Zhang J, et al. BMC Bioinformatics. 2022 Dec 29;23(1):564. doi: 10.1186/s12859-022-05119-6. BMC Bioinformatics. 2022. PMID: 36581822 Free PMC article.

See all "Cited by" articles

References

1. Hopkins A. L. Nature. 2009;462(7270):167–168. doi: 10.1038/462167a. - DOI - PubMed
1. Paul S. M., Mytelka D. S., Dunwiddie C. T., Persinger C. C., Munos B. H., Lindborg S. R., Schacht A. L. Nat. Rev. Drug Discovery. 2010;9(3):203–214. doi: 10.1038/nrd3078. - DOI - PubMed
1. Mendez D., Gaulton A., Bento A. P., Chambers J., De Veij M., Félix E., Magariños M. P., Mosquera J. F., Mutowo P., Nowotka M. Nucleic Acids Res. 2018;47(D1):D930–D940. doi: 10.1093/nar/gky1075. - DOI - PMC - PubMed
1. Wang Y., Bryant S. H., Cheng T., Wang J., Gindulyte A., Shoemaker B. A., Thiessen P. A., He S., Zhang J. Nucleic Acids Res. 2017;45(D1):D955–D963. doi: 10.1093/nar/gkw1118. - DOI - PMC - PubMed
1. Reymond J. L. Acc. Chem. Res. 2015;48(3):722–730. doi: 10.1021/ar500432k. - DOI - PubMed

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

DEEPScreen: high performance drug-target interaction prediction with convolutional neural networks using 2-D structural compound representations

Affiliations

DEEPScreen: high performance drug-target interaction prediction with convolutional neural networks using 2-D structural compound representations

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Miscellaneous