Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 1:9:872086.
doi: 10.3389/fmolb.2022.872086. eCollection 2022.

Validation of Deep Learning-Based DFCNN in Extremely Large-Scale Virtual Screening and Application in Trypsin I Protease Inhibitor Discovery

Affiliations

Validation of Deep Learning-Based DFCNN in Extremely Large-Scale Virtual Screening and Application in Trypsin I Protease Inhibitor Discovery

Haiping Zhang et al. Front Mol Biosci. .

Abstract

Computational methods with affordable computational resources are highly desirable for identifying active drug leads from millions of compounds. This requires a model that is both highly efficient and relatively accurate, which cannot be achieved by most of the current methods. In real virtual screening (VS) application scenarios, the desired method should perform much better in selecting active compounds by prediction than by random chance. Here, we systematically evaluate the performance of our previously developed DFCNN model in large-scale virtual screening, and the results show our method has approximately 22 times the success rate compared to the random chance on average with a score cutoff of 0.99. Of the 102 test cases, 10 cases have more than 98 times the success rate of a random guess. Interestingly, in three cases, the prediction success rate is 99 times that of a random guess by a score cutoff of 0.99. This indicates that in most situations after our extremely large-scale VS, the dataset can be reduced 20 to 100 times for the next step of virtual screening based on docking or MD simulation. Furthermore, we have employed an experimental method to verify our computational method by finding several activity inhibitors for Trypsin I Protease. In addition, we also show its proof-of-concept application in de novo drug screening. The results indicate the massive potential of this method in the first step of the real drug development workflow. Moreover, DFCNN only takes about 0.0000225s for one protein-compound prediction on average with 80 Intel CPU cores (2.00 GHz) and 60 GB RAM, which is at least tens of thousands of times faster than AutoDock Vina or Schrödinger high-throughput virtual screening. Additionally, an online webserver based on DFCNN for large-scale screening is available at http://cbblab.siat.ac.cn/DFCNN/index.php for the convenience of the users.

Keywords: DFCNN; Trypsin I Protease; de novo drug screening; deep learning; extremely large-scale virtual screening.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Schematic diagram of the systematic estimation of DFCNN performance in extremely large-scale virtual screening. (A) Collecting target protein-related data, (B) large-scale virtual screening against an extensive compound database (ZINC compounds plus known active compound) for each protein target, (C) doing various analyses based on the prediction, and (D) considering a good performance case (here, we use Trypsin I Protease) as a test example to find novel active compounds by combining the computational method with experimental validation.
FIGURE 2
FIGURE 2
The 10 top performance proteins and their pocket regions with the known ligand.
FIGURE 3
FIGURE 3
The top performance proteins and their corresponding potential inhibitors.
FIGURE 4
FIGURE 4
The poor performance proteins by the DFCNN. The gene names are annotated below, with the corresponding PDB ID shown in the bracket. The proteins within the red box have multiple ligands in one pocket, and the proteins within the green box are membrane proteins.
FIGURE 5
FIGURE 5
Fluorescence emission spectra of trypsin–S763-0509 (A), trypsin–PB90939671 (B), trypsin–STK573808 (C), trypsin–STK260654 (D), and trypsin–Z25746562 (E) as well as double-log plots of the quenching effect of PPGs on trypsin fluorescence (F). (a–k) The trypsin concentration was 1.0 × 10–5 M, and the compound concentrations were 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, and 10.0 (×10–4 M).
FIGURE 6
FIGURE 6
Analysis of the MD simulation result of Trypsin I with Z25746562 (A), STK260654 (B), STK573808 (C), PB90939671 (D), and S763-0509 (E). The left panel shows the RMSD of Trypsin I and ligand (dark green and magenta) and indicates the number of hydrogen bonds between the protein and ligand. The middle panel shows the protein–ligand conformation of the last frame from the 100 ns MD simulation. The right panel shows the 2D diagram of the protein–ligand interaction from the last frame of MD simulation.
FIGURE 7
FIGURE 7
Representative structures of de novo candidates and their predicted interaction with the Trypsin I Protease.

Similar articles

Cited by

References

    1. Allen W. J., Balius T. E., Mukherjee S., Brozell S. R., Moustakas D. T., Lang P. T., et al. (2015). DOCK 6: Impact of New Features and Current Docking Performance. J. Comput. Chem. 36, 1132–1156. 10.1002/jcc.23905 - DOI - PMC - PubMed
    1. Back S., Yoon J., Tian N., Zhong W., Tran K., Ulissi Z. W. (2019). Convolutional Neural Network of Atomic Surface Structures to Predict Binding Energies for High-Throughput Screening of Catalysts. J. Phys. Chem. Lett. 10, 4401–4408. 10.1021/acs.jpclett.9b01428 - DOI - PubMed
    1. Chen H., Engkvist O., Wang Y., Olivecrona M., Blaschke T. (2018). The Rise of Deep Learning in Drug Discovery. Drug Discov. Today 23, 1241–1250. 10.1016/j.drudis.2018.01.039 - DOI - PubMed
    1. Cheng T., Li Q., Zhou Z., Wang Y., Bryant S. H. (2012). Structure-based Virtual Screening for Drug Discovery: A Problem-Centric Review. AAPS J. 14, 133–141. 10.1208/s12248-012-9322-0 - DOI - PMC - PubMed
    1. Fang Y., Ding Y., Feinstein W. P., Koppelman D. M., Moreno J., Jarrell M., et al. (2016). GeauxDock: Accelerating Structure-Based Virtual Screening with Heterogeneous Computing. PLoS One 11, e0158898. 10.1371/journal.pone.0158898 - DOI - PMC - PubMed

LinkOut - more resources