Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan 8;11(9):2531-2557.
doi: 10.1039/c9sc03414e. eCollection 2020 Mar 7.

DEEPScreen: high performance drug-target interaction prediction with convolutional neural networks using 2-D structural compound representations

Affiliations

DEEPScreen: high performance drug-target interaction prediction with convolutional neural networks using 2-D structural compound representations

Ahmet Sureyya Rifaioglu et al. Chem Sci. .

Abstract

The identification of physical interactions between drug candidate compounds and target biomolecules is an important process in drug discovery. Since conventional screening procedures are expensive and time consuming, computational approaches are employed to provide aid by automatically predicting novel drug-target interactions (DTIs). In this study, we propose a large-scale DTI prediction system, DEEPScreen, for early stage drug discovery, using deep convolutional neural networks. One of the main advantages of DEEPScreen is employing readily available 2-D structural representations of compounds at the input level instead of conventional descriptors that display limited performance. DEEPScreen learns complex features inherently from the 2-D representations, thus producing highly accurate predictions. The DEEPScreen system was trained for 704 target proteins (using curated bioactivity data) and finalized with rigorous hyper-parameter optimization tests. We compared the performance of DEEPScreen against the state-of-the-art on multiple benchmark datasets to indicate the effectiveness of the proposed approach and verified selected novel predictions through molecular docking analysis and literature-based validation. Finally, JAK proteins that were predicted by DEEPScreen as new targets of a well-known drug cladribine were experimentally demonstrated in vitro on cancer cells through STAT3 phosphorylation, which is the downstream effector protein. The DEEPScreen system can be exploited in the fields of drug discovery and repurposing for in silico screening of the chemogenomic space, to provide novel DTIs which can be experimentally pursued. The source code, trained "ready-to-use" prediction models, all datasets and the results of this study are available at ; https://github.com/cansyl/DEEPscreen.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1. Illustration of the deep convolutional neural network structure of DEEPScreen, where the sole input is the 2-D structural images of the drugs and drug candidate compounds (generated from the SMILES representations as a data pre-processing step). Each target protein has an individual prediction model with specifically optimized hyper-parameters (please refer to the Methods section). For each query compound, the model produces a binary output either as active or inactive, considering the interaction with the corresponding target.
Fig. 2
Fig. 2. Data filtering and processing steps to create the training dataset of each target protein model. Predictive models were trained for 704 target proteins, each of which has at least 100 known active ligands in the ChEMBL database.
Fig. 3
Fig. 3. (a) Overall predictive performance comparison of DEEPScreen vs. state-of-the-art classifiers. Each point in the horizontal axis represents a target protein model: the vertical axis represents performance in the MCC, accuracy and F1-score, respectively. For each classifier, targets are ranked in a descending performance order. Average performance values (mean and median) are given inside the plots. (b) Target-based maximum predictive performance (MCC-based) heatmap for DEEPScreen and conventional classifiers (columns) (LR: logistic regression, RF: random forest, SVM: support vector machine; ECFP: fingerprint-based models, and image: 2-D structural representation-based models). For each target protein (row), classifier performances are shown in shades of red (i.e., high performance) and blue (i.e., low performance) colours according to Z-scores (Z-scores are calculated individually for each target). Rows are arranged in blocks according to target families. The height of a block is proportional to the number of targets in its corresponding family (enzymes: 374, GPCRs: 212, ion channels: 33, nuclear receptors: 27, and others: 58). Within each block, targets are arranged according to descending performance from top down with respect to DEEPScreen. Grey colour signifies the cases, where learning was not possible. (c) MCC performance box plots in the 10-fold cross-validation experiment, to compare DEEPScreen with the state-of-the-art DTI predictors.
Fig. 4
Fig. 4. Predictive performance evaluation and comparison of DEEPScreen against the state-of-the-art DTI prediction approaches, on scaffold-split benchmarks: (a) bar plots of MCC values on representative targets dataset; (b) bar plots of MCC values on the MUV dataset.
Fig. 5
Fig. 5. JAK downstream effector alteration in the presence of cladribine. (a) Live cell images for cladribine treated cells before (0H) and after 72 hours of treatment (72H). (b) Flow cytometry histogram of the phosphorylated STAT3 protein complex in Mahlavu, Huh7 and HepG2 cells. (c) STAT3 protein complex levels in Mahlavu, Huh7 and HepG2 cells detected and assessed with Phospho-Tyr705 antibodies. (d) Cell cycle analysis: (e) apoptotic cells characterized by annexin V assay. (f) Changes in protein expression levels of STAT3 related to cladribine treatment. Bar graphs represent normalized STAT3 and phospho-STAT3 compared to calnexin. DMSO was used as the vehicle control.
Fig. 6
Fig. 6. A case study for the evaluation of DEEPScreen predictions. (a) 3-D structure of the human renin protein (obtained from PDB id: 2REN), together with the 2-D representations of selected active (connected by green arrows) and inactive (connected by red arrows) ligand predictions in the predictive performance tests (the true experimental screening assay activities – IC50 – are shown under the corresponding images). Also, 2-D images of selected truly novel predicted inhibitors of renin (i.e., cortivazol, lasofoxifene and sulprostone) are displayed (connected by blue arrows) together with the estimated docking Kd values. (b) Renin–aliskiren crystal structure (PDB id: ; 2V0Z, aliskiren is displayed in red color) and the best poses in the automated molecular docking of DEEPScreen predicted inhibitors of renin: cortivazol (blue), lasofoxifene (green) and sulprostone (violet), to the structurally known binding site of renin (gold color), displaying hydrogen bonds with light blue lines. The docking process produced sufficiently low binding free energies for the novel inhibitors, around the levels of the structurally characterized ligands of renin, aliskiren and remikiren, indicating high potency.

Similar articles

Cited by

References

    1. Hopkins A. L. Nature. 2009;462(7270):167–168. doi: 10.1038/462167a. - DOI - PubMed
    1. Paul S. M., Mytelka D. S., Dunwiddie C. T., Persinger C. C., Munos B. H., Lindborg S. R., Schacht A. L. Nat. Rev. Drug Discovery. 2010;9(3):203–214. doi: 10.1038/nrd3078. - DOI - PubMed
    1. Mendez D., Gaulton A., Bento A. P., Chambers J., De Veij M., Félix E., Magariños M. P., Mosquera J. F., Mutowo P., Nowotka M. Nucleic Acids Res. 2018;47(D1):D930–D940. doi: 10.1093/nar/gky1075. - DOI - PMC - PubMed
    1. Wang Y., Bryant S. H., Cheng T., Wang J., Gindulyte A., Shoemaker B. A., Thiessen P. A., He S., Zhang J. Nucleic Acids Res. 2017;45(D1):D955–D963. doi: 10.1093/nar/gkw1118. - DOI - PMC - PubMed
    1. Reymond J. L. Acc. Chem. Res. 2015;48(3):722–730. doi: 10.1021/ar500432k. - DOI - PubMed