Validation of Deep Learning-Based DFCNN in Extremely Large-Scale Virtual Screening and Application in Trypsin I Protease Inhibitor Discovery

Haiping Zhang¹, Xiao Lin², Yanjie Wei¹, Huiling Zhang¹, Linbu Liao³, Hao Wu¹, Yi Pan¹, Xuli Wu²

Affiliations

¹ Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
² School of Medicine, Shenzhen University, Shenzhen, China.
³ College of Software Technology, Zhejiang University, Hangzhou, China.

PMID: 35720125
PMCID: PMC9200220
DOI: 10.3389/fmolb.2022.872086

Validation of Deep Learning-Based DFCNN in Extremely Large-Scale Virtual Screening and Application in Trypsin I Protease Inhibitor Discovery

Haiping Zhang et al. Front Mol Biosci. 2022.

. 2022 Jun 1:9:872086.

doi: 10.3389/fmolb.2022.872086. eCollection 2022.

Authors

Haiping Zhang¹, Xiao Lin², Yanjie Wei¹, Huiling Zhang¹, Linbu Liao³, Hao Wu¹, Yi Pan¹, Xuli Wu²

Affiliations

¹ Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
² School of Medicine, Shenzhen University, Shenzhen, China.
³ College of Software Technology, Zhejiang University, Hangzhou, China.

PMID: 35720125
PMCID: PMC9200220
DOI: 10.3389/fmolb.2022.872086

Abstract

Computational methods with affordable computational resources are highly desirable for identifying active drug leads from millions of compounds. This requires a model that is both highly efficient and relatively accurate, which cannot be achieved by most of the current methods. In real virtual screening (VS) application scenarios, the desired method should perform much better in selecting active compounds by prediction than by random chance. Here, we systematically evaluate the performance of our previously developed DFCNN model in large-scale virtual screening, and the results show our method has approximately 22 times the success rate compared to the random chance on average with a score cutoff of 0.99. Of the 102 test cases, 10 cases have more than 98 times the success rate of a random guess. Interestingly, in three cases, the prediction success rate is 99 times that of a random guess by a score cutoff of 0.99. This indicates that in most situations after our extremely large-scale VS, the dataset can be reduced 20 to 100 times for the next step of virtual screening based on docking or MD simulation. Furthermore, we have employed an experimental method to verify our computational method by finding several activity inhibitors for Trypsin I Protease. In addition, we also show its proof-of-concept application in de novo drug screening. The results indicate the massive potential of this method in the first step of the real drug development workflow. Moreover, DFCNN only takes about 0.0000225s for one protein-compound prediction on average with 80 Intel CPU cores (2.00 GHz) and 60 GB RAM, which is at least tens of thousands of times faster than AutoDock Vina or Schrödinger high-throughput virtual screening. Additionally, an online webserver based on DFCNN for large-scale screening is available at http://cbblab.siat.ac.cn/DFCNN/index.php for the convenience of the users.

Keywords: DFCNN; Trypsin I Protease; de novo drug screening; deep learning; extremely large-scale virtual screening.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**FIGURE 1**
Schematic diagram of the systematic estimation of DFCNN performance in extremely large-scale virtual screening. **(A)** Collecting target protein-related data, **(B)** large-scale virtual screening against an extensive compound database (ZINC compounds plus known active compound) for each protein target, **(C)** doing various analyses based on the prediction, and **(D)** considering a good performance case (here, we use Trypsin I Protease) as a test example to find novel active compounds by combining the computational method with experimental validation.

**FIGURE 2**
The 10 top performance proteins and their pocket regions with the known ligand.

**FIGURE 3**
The top performance proteins and their corresponding potential inhibitors.

**FIGURE 4**
The poor performance proteins by the DFCNN. The gene names are annotated below, with the corresponding PDB ID shown in the bracket. The proteins within the red box have multiple ligands in one pocket, and the proteins within the green box are membrane proteins.

**FIGURE 5**
Fluorescence emission spectra of trypsin–S763-0509 **(A)**, trypsin–PB90939671 **(B)**, trypsin–STK573808 **(C)**, trypsin–STK260654 **(D),** and trypsin–Z25746562 **(E)** as well as double-log plots of the quenching effect of PPGs on trypsin fluorescence **(F)**. (a–k) The trypsin concentration was 1.0 × 10^–5 M, and the compound concentrations were 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, and 10.0 (×10^–4 M).

**FIGURE 6**
Analysis of the MD simulation result of Trypsin I with Z25746562 **(A)**, STK260654 **(B)**, STK573808 **(C)**, PB90939671 **(D)**, and S763-0509 **(E)**. The left panel shows the RMSD of Trypsin I and ligand (dark green and magenta) and indicates the number of hydrogen bonds between the protein and ligand. The middle panel shows the protein–ligand conformation of the last frame from the 100 ns MD simulation. The right panel shows the 2D diagram of the protein–ligand interaction from the last frame of MD simulation.

**FIGURE 7**
Representative structures of *de novo* candidates and their predicted interaction with the Trypsin I Protease.

See this image and copyright information in PMC

Cited by

Revolutionizing GPCR-ligand predictions: DeepGPCR with experimental validation for high-precision drug discovery.
Zhang H, Fan H, Wang J, Hou T, Saravanan KM, Xia W, Kan HW, Li J, Zhang JZH, Liang X, Chen Y. Zhang H, et al. Brief Bioinform. 2024 May 23;25(4):bbae281. doi: 10.1093/bib/bbae281. Brief Bioinform. 2024. PMID: 38864340 Free PMC article.
DeepBindGCN: Integrating Molecular Vector Representation with Graph Convolutional Neural Networks for Protein-Ligand Interaction Prediction.
Zhang H, Saravanan KM, Zhang JZH. Zhang H, et al. Molecules. 2023 Jun 10;28(12):4691. doi: 10.3390/molecules28124691. Molecules. 2023. PMID: 37375246 Free PMC article.

References

1. Allen W. J., Balius T. E., Mukherjee S., Brozell S. R., Moustakas D. T., Lang P. T., et al. (2015). DOCK 6: Impact of New Features and Current Docking Performance. J. Comput. Chem. 36, 1132–1156. 10.1002/jcc.23905 - DOI - PMC - PubMed
1. Back S., Yoon J., Tian N., Zhong W., Tran K., Ulissi Z. W. (2019). Convolutional Neural Network of Atomic Surface Structures to Predict Binding Energies for High-Throughput Screening of Catalysts. J. Phys. Chem. Lett. 10, 4401–4408. 10.1021/acs.jpclett.9b01428 - DOI - PubMed
1. Chen H., Engkvist O., Wang Y., Olivecrona M., Blaschke T. (2018). The Rise of Deep Learning in Drug Discovery. Drug Discov. Today 23, 1241–1250. 10.1016/j.drudis.2018.01.039 - DOI - PubMed
1. Cheng T., Li Q., Zhou Z., Wang Y., Bryant S. H. (2012). Structure-based Virtual Screening for Drug Discovery: A Problem-Centric Review. AAPS J. 14, 133–141. 10.1208/s12248-012-9322-0 - DOI - PMC - PubMed
1. Fang Y., Ding Y., Feinstein W. P., Koppelman D. M., Moreno J., Jarrell M., et al. (2016). GeauxDock: Accelerating Structure-Based Virtual Screening with Heterogeneous Computing. PLoS One 11, e0158898. 10.1371/journal.pone.0158898 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Validation of Deep Learning-Based DFCNN in Extremely Large-Scale Virtual Screening and Application in Trypsin I Protease Inhibitor Discovery

Affiliations

Validation of Deep Learning-Based DFCNN in Extremely Large-Scale Virtual Screening and Application in Trypsin I Protease Inhibitor Discovery

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources