Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Jul 11:2024.07.08.602536.
doi: 10.1101/2024.07.08.602536.

The impact of Library Size and Scale of Testing on Virtual Screening

Affiliations

The impact of Library Size and Scale of Testing on Virtual Screening

Fangyu Liu et al. bioRxiv. .

Update in

  • The impact of library size and scale of testing on virtual screening.
    Liu F, Mailhot O, Glenn IS, Vigneron SF, Bassim V, Xu X, Fonseca-Valencia K, Smith MS, Radchenko DS, Fraser JS, Moroz YS, Irwin JJ, Shoichet BK. Liu F, et al. Nat Chem Biol. 2025 Jul;21(7):1039-1045. doi: 10.1038/s41589-024-01797-w. Epub 2025 Jan 3. Nat Chem Biol. 2025. PMID: 39753705

Abstract

Virtual libraries for ligand discovery have recently increased 10,000-fold, and this is thought to have improved hit rates and potencies from library docking. This idea has not, however, been experimentally tested in direct comparisons of larger-vs-smaller libraries. Meanwhile, though libraries have exploded, the scale of experimental testing has little changed, with often only dozens of high-ranked molecules investigated, making interpretation of hit rates and affinities uncertain. Accordingly, we docked a 1.7 billion molecule virtual library against the model enzyme AmpC β-lactamase, testing 1,521 new molecules and comparing the results to the same screen with a library of 99 million molecules, where only 44 molecules were tested. Encouragingly, the larger screen outperformed the smaller one: hit rates improved by two-fold, more new scaffolds were discovered, and potency improved. Overall, 50-fold more inhibitors were found, supporting the idea that there are many more compounds to be discovered than are being tested. With so many compounds evaluated, we could ask how the results vary with number tested, sampling smaller sets at random from the 1521. Hit rates and affinities were highly variable when we only sampled dozens of molecules, and it was only when we included several hundred molecules that results converged. As docking scores improved, so too did the likelihood of a molecule binding; hit rates improved steadily with docking score, as did affinities. This also appeared true on reanalysis of large-scale results against the σ2 and dopamine D4 receptors. It may be that as the scale of both the virtual libraries and their testing grows, not only are better ligands found but so too does our ability to rank them.

PubMed Disclaimer

Conflict of interest statement

BKS is a founder of Epiodyne, Inc, BlueDolphin, LLC, and Deep Apple Therapeutics, Inc., serves on the SAB of Schrodinger LLC and of Vilya Therapeutics, on the SRB of Genentech, and consults for Hyku Therapeutics. JJI co-founded Deep Apple Therapeutics Inc. and BlueDolphin LLC. JSF is a consultant for, has equity in, and receives research support from Relay Therapeutics.

Figures

Fig. 1.
Fig. 1.. Superposition of the crystallographic and docking poses of the new AmpC inhibitors.
Crystal structures (carbons in cyan) and docked poses (carbons in magenta) of the inhibitors. AmpC carbon atoms are in grey, oxygens in red, nitrogens in blue, sulfurs in yellow, chlorides in green, and fluorides in light blue. Hydrogen bonds are shown as black dashed lines. a-c, AmpC in complex with Z6615020275 (r.m.s.d to crystal structure 0.79 Å, Ki 2 uM), Z6615017782 (r.m.s.d = 0.97 Å, 1.5 uM) and Z6615017509 (r.m.s.d = 3.14 Å, 0.86 nM). The overlay of the crystal and docked poses are shown. d-e, AmpC in complex with Z8427841182 (r.m.s.d = 4.73 Å, 36 uM) and Z4462773688 (r.m.s.d = 5.61 Å, 325 uM). The docked poses (left panel), crystal poses (middle panel) and the overlay of the docked and crystal poses are shown (right panel).
Fig. 2.
Fig. 2.. Larger-scale docking and testing increases hit rates and reduces uncertainty.
a, The hit rates (number of actives/total tested) of the 1.7 Billion screen (blue bar) versus the 99 Million screen (orange bar). b, Hit rates by different affinity bins in ‘22 screen and ‘19 screen. c, Number of hits (number of actives) of the 1.7 B screen (blue bar) versus the 99 M screen (orange bar). d, The impact of randomly purchasing 44, 139, 439 molecules out of 1,296 molecules for testing on hit rates. Each sample size is randomly drawn 30 times and the resulting hit rates were plotted. The error bars represent SDs of the hit rates. The hit rates are 22.42 ± 6.08% (N = 44), 23.67 ± 3.54% (N = 139) and 22.80 ± 1.65% (N = 439). e, The impact of randomly purchasing 44, 139, 439 molecules out of 1,296 molecules for testing on hit rates with different affinity cutoffs. Each sample size is drawn 30 times and the resulting hit rates were plotted. The error bars represent SDs of the hit rates.
Fig. 3.
Fig. 3.. Several hundred compounds should be tested in ultra-large docking campaigns.
a, For the top-ranking 1% of the docked molecules, the relationship between hit affinity and hit rates can be fit with an exponential plateau model y = b (1 – e−cx) with y represents the hit rate, x is minimum affinity to be classified as a hit (for AmpC, the unit is in micromolar and for σ2 and D4, the unit is in nanomolar), b is the maximal hit rate. The fit maximal hit rates are 34.5% for AmpC with an R2 of 0.998, 43% for σ2 receptor with an R2 of 0.998, and 20.8% for D4 with an R2 of 0.985. b, The impact of sub-sampling on the R2 of the fit. From among the top-ranking 1% of the docked molecules, 1,295 (AmpC, blue), 327 (σ2, orange) and 371 (D4, pink), each subsample is bootstrapped 1,000 times and fit with the parameters derived from the entire dataset. The R2 values are plotted with the symbols indicating the average and the error bars indicating the standard deviations of the R2. A dashed line of R2 = 0.95 is labeled. The sample sizes at which the average R2 value reaches 0.95 are labeled. For σ2, the sample size is 135, for AmpC, it is 215; and for D4, it is 495. c, Mean and 95% confidence interval for hit rate in relation to sample size for AmpC, σ2 and D4. The dashed lines show the mean hit rate from the compounds in the top 1% by docking score, and the solid line shows the boundary of a single-sided 95% confidence interval from 100,000 bootstrap iterations. Hits are defined as 400 μM affinity or better for AmpC, 677.5 nM or better for σ2 and 10 μM or better for D4.
Fig. 4.
Fig. 4.. Hit rate of experimentally tested compounds plotted against DOCK scores with different affinity cutoffs.
a, The AmpC hit rates of 1,292 auto-picked compounds using four different affinity cutoffs, < 400, 137, 40 and 13 μM, are plotted against DOCK scores. b, σ2 receptor hit rates of 484 compounds plotted against DOCK scores with three different affinity cutoffs: < 667.5, 241.2, 67.8 nM. c, Dopamine D4 hit rates of 549 compounds plotted against DOCK scores with two different affinity cutoffs: <10 and <1 μM. d, Rescaling the hit rate curves of the three targets by the log10 of fractional rank in the library. For each target, the most permissive hit definition is used.

References

    1. Lyu J. et al. Ultra-large library docking for discovering new chemotypes. Nature 566, 224–229, doi: 10.1038/s41586-019-0917-9 (2019). - DOI - PMC - PubMed
    1. Gorgulla C. et al. An open-source drug discovery platform enables ultra-large virtual screens. Nature 580, 663–668, doi: 10.1038/s41586-020-2117-z (2020). - DOI - PMC - PubMed
    1. Stein R. M. et al. Virtual discovery of melatonin receptor ligands to modulate circadian nrhythms. Nature 579, 609–614, doi: 10.1038/s41586-020-2027-0 (2020). - DOI - PMC - PubMed
    1. Alon A. et al. Structures of the sigma(2) receptor enable docking for bioactive ligand discovery. Nature 600, 759–764, doi: 10.1038/s41586-021-04175-x (2021). - DOI - PMC - PubMed
    1. Sadybekov A. A. et al. Synthon-based ligand discovery in virtual libraries of over 11 billion compounds. Nature 601, 452–459, doi: 10.1038/s41586-021-04220-9 (2022). - DOI - PMC - PubMed

Publication types