Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jan 30;38(3):169-177.
doi: 10.1002/jcc.24667. Epub 2016 Nov 17.

Improving scoring-docking-screening powers of protein-ligand scoring functions using random forest

Affiliations

Improving scoring-docking-screening powers of protein-ligand scoring functions using random forest

Cheng Wang et al. J Comput Chem. .

Abstract

The development of new protein-ligand scoring functions using machine learning algorithms, such as random forest, has been of significant interest. By efficiently utilizing expanded feature sets and a large set of experimental data, random forest based scoring functions (RFbScore) can achieve better correlations to experimental protein-ligand binding data with known crystal structures; however, more extensive tests indicate that such enhancement in scoring power comes with significant under-performance in docking and screening power tests compared to traditional scoring functions. In this work, to improve scoring-docking-screening powers of protein-ligand docking functions simultaneously, we have introduced a Δvina RF parameterization and feature selection framework based on random forest. Our developed scoring function Δvina RF20 , which employs 20 descriptors in addition to the AutoDock Vina score, can achieve superior performance in all power tests of both CASF-2013 and CASF-2007 benchmarks compared to classical scoring functions. The Δvina RF20 scoring function and its code are freely available on the web at: https://www.nyu.edu/projects/yzhang/DeltaVina. © 2016 Wiley Periodicals, Inc.

Keywords: docking; machine learning; protein-ligand binding affinity; random forest; scoring function.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Performance of 22 scoring functions in (A) scoring power measured by Pearson’s R, (B) ranking power in terms of high-level success rate and (C) docking power measured by the success rate when the best-scored pose is considered to match the native pose in CASF-2013 benchmark. ΔvinaRF20 is colored in red and AutoDock Vina is colored in green. All results colored in blue are obtained from reference[13].
Figure 2
Figure 2
Performance of 18 scoring functions in (A) scoring power measured by Pearson’s R, (B) ranking power in terms of high-level success rate and (C) docking power measured by the success rate when the best-scored pose is considered to match the native pose in CASF-2007 benchmark. ΔvinaRF20 is colored in red and AutoDock Vina is colored in green. All results colored in blue are obtained from reference[12].
Figure 3
Figure 3
Performance of 22 scoring functions in screening power measured by (A) enrichment factor and (B) success rate at top 1% level in CASF-2013 benchmark. ΔvinaRF20 is colored in red and AutoDock Vina is colored in green. All results colored in blue are obtained from reference[13].
Figure 4
Figure 4
CASF-2013 benchmark test performance of AutoDock Vina (colored in green), scoring function developed with RF approach (colored in blue) using experimental data alone and the same twenty features in ΔvinaRF20 scoring function, and scoring functions developed with RF approach and ΔvinaRF approach (colored in red) using the same twenty features and the same training set for the development of the ΔvinaRF20 scoring function. (A) Scoring power; (B) Docking power; (C) Screening power. Each set is run 10 times with different random seed for random forest and calculated by averaging over 10 performances except AutoDock Vina. The AutoDock Vina performance is also indicated by dashed line.

References

    1. Huang SY, Grinter SZ, Zou XQ. Phys Chem Chem Phys. 2010;12:12899. - PMC - PubMed
    1. Lyne PD. Drug Discovery Today. 2002;7:1047. - PubMed
    1. Shoichet BK. Nature. 2004;432:862. - PMC - PubMed
    1. McInnes C. Curr Opin Chem Biol. 2007;11:494. - PubMed
    1. Guido RVC, Oliva G, Andricopulo AD. Curr Med Chem. 2008;15:37. - PubMed

Publication types