Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr:46:135-147.
doi: 10.1016/j.jare.2022.07.001. Epub 2022 Jul 25.

SCORCH: Improving structure-based virtual screening with machine learning classifiers, data augmentation, and uncertainty estimation

Affiliations

SCORCH: Improving structure-based virtual screening with machine learning classifiers, data augmentation, and uncertainty estimation

Miles McGibbon et al. J Adv Res. 2023 Apr.

Abstract

Introduction: The discovery of a new drug is a costly and lengthy endeavour. The computational prediction of which small molecules can bind to a protein target can accelerate this process if the predictions are fast and accurate enough. Recent machine-learning scoring functions re-evaluate the output of molecular docking to achieve more accurate predictions. However, previous scoring functions were trained on crystalised protein-ligand complexes and datasets of decoys. The limited availability of crystal structures and biases in the decoy datasets can lower the performance of scoring functions.

Objectives: To address key limitations of previous scoring functions and thus improve the predictive performance of structure-based virtual screening.

Methods: A novel machine-learning scoring function was created, named SCORCH (Scoring COnsensus for RMSD-based Classification of Hits). To develop SCORCH, training data is augmented by considering multiple ligand poses and labelling poses based on their RMSD from the native pose. Decoy bias is addressed by generating property-matched decoys for each ligand and using the same methodology for preparing and docking decoys and ligands. A consensus of 3 different machine learning approaches is also used to improve performance.

Results: We find that multi-pose augmentation in SCORCH improves its docking power and screening power on independent benchmark datasets. SCORCH outperforms an equivalent scoring function trained on single poses, with a 1 % enrichment factor (EF) of 13.78 vs. 10.86 on 18 DEKOIS 2.0 targets and a mean native pose rank of 5.9 vs 30.4 on CSAR 2014. Additionally, SCORCH outperforms widely used scoring functions in virtual screening and pose prediction on independent benchmark datasets.

Conclusion: By rationally addressing key limitations of previous scoring functions, SCORCH improves the performance of virtual screening. SCORCH also provides an estimate of its uncertainty, which can help reduce the cost and time required for drug discovery.

Keywords: Docking; Drug discovery; Machine learning; Neural networks; Scoring; Virtual screening.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
Production workflow of single-pose and multiple-pose machine learning models. The single-pose dataset consisted of one pose for each protein-ligand pair and poses were labelled solely based on binding affinity. The multiple-pose dataset consisted of multiple poses for each protein ligand pair and poses were labelled based on binding affinity and RMSD from the native crystal pose. On each dataset, a GBDT (using XGBoost), a FF NN, and a W&D NN were trained and a consensus of the three model types was considered. Single-pose models were developed to evaluate whether using multiple poses with labelling stratified by both RMSD and binding affinity improves performance. SCORCH is a consensus of the three models trained on multiple-pose data. Additional details on data preparation (white boxes) are indicated in Figure S1.
Fig. 2
Fig. 2
RMSD-based pose labelling example for PDB structure 1a0q. a) Docked poses (green) <2 Å from the crystal pose (thick blue lines) are given a label of 1. b) Docked poses (olive) between 2 Å and 4.5 Å from the crystal pose (thick blue lines) are excluded from the dataset. c) Docked poses (red) over 4.5 Å from the crystal pose (thick blue lines) are given a label of 0.
Fig. 3
Fig. 3
Precision-Recall curves for produced machine-learning models and third-party scoring functions on 5,621 test set complexes.
Fig. 4
Fig. 4
SCORCH is the best performing scoring function on a subset of the DEKOIS 2.0 independent benchmarking dataset. a) Precision-Recall curves for produced machine-learning models and third-party scoring functions on 18 DEKOIS 2.0 targets. b) Enrichment factors at 0.5 %, 1 %, 2 % and 5 % for produced machine-learning models and third-party scoring functions across 18 DEKOIS 2.0 targets. White diamonds indicate the mean EF.
Fig. 5
Fig. 5
SCORCH certainty metric is a robust a priori indicator of enrichment success. a) Relationship between SCORCH certainty and residuals across all 22,319 DEKOIS 2.0 actives and decoys. Fitted regression line shown in red with the line equation displayed on the top left (p = 2e-16, Pr = −0.45); 95 % confidence intervals of the regression line are shown in grey. b) Relationship between enrichment factor and mean SCORCH certainty across top 0.5 % of ligands for 18 DEKOIS 2.0 receptors. Fitted regression line shown in red (p = 0.013, Pr = 0.57); 95 % confidence intervals of the regression line are shown in grey.
Fig. 6
Fig. 6
RMSD-based pose labelling used in SCORCH improves the docking power compared to other MLSFs. a) Ranks of the lowest RMSD docked ligand pose across 510 test set complexes for produced machine-learning models and third-party scoring functions; White diamonds indicate the mean rank with the value displayed above each point. b) Ranks of the near native pose across the CSAR 2014 pose prediction benchmark for produced machine-learning models and third-party scoring functions. White diamonds indicate the mean rank with the value displayed above each point.
Fig. 7
Fig. 7
10 features with highest influence on performance for six individual machine-learning models. Highly influential features for each model are listed on the y-axis. The x-axis represents the decrease in model validation set AUCPR when a given feature is replaced with random noise; a greater decrease indicates greater reliance on a given input feature for making predictions.

References

    1. Sliwoski G., Kothiwale S., Meiler J., Lowe E.W., Barker E.L. Computational Methods in Drug Discovery. Pharmacol Rev. 2014;66(1):334–395. - PMC - PubMed
    1. Tang Y.T., Marshall G.R. Virtual screening for lead discovery. Methods Mol Biol Clifton NJ. 2011;716:1–22. doi: 10.1007/978-1-61779-012-6_1. - DOI - PubMed
    1. Ma D.-L., Chan D.-S.-H., Leung C.-H. Molecular docking for virtual screening of natural product databases. Chem Sci. 2011;2:1656–1665. doi: 10.1039/C1SC00152C. - DOI
    1. Guedes I.A., Pereira F.S.S., Dardenne L.E. Empirical Scoring Functions for Structure-Based Virtual Screening: Applications, Critical Aspects, and Challenges. Front Pharmacol. 2018;9:1089. doi: 10.3389/fphar.2018.01089. - DOI - PMC - PubMed
    1. Mehta S., Laghuvarapu S., Pathak Y., Sethi A., Alvala M., Priyakumar U.D. MEMES: Machine learning framework for Enhanced MolEcular Screening. Chem Sci. 2021;12:11710–11721. doi: 10.1039/D1SC02783B. - DOI - PMC - PubMed

Publication types

LinkOut - more resources