AI is a viable alternative to high throughput screening: a 318-target study

Atomwise AIMS Program

Collaborators

PMID: 38565852
PMCID: PMC10987645
DOI: 10.1038/s41598-024-54655-z

AI is a viable alternative to high throughput screening: a 318-target study

Atomwise AIMS Program. Sci Rep. 2024.

. 2024 Apr 2;14(1):7526.

doi: 10.1038/s41598-024-54655-z.

PMID: 38565852
PMCID: PMC10987645
DOI: 10.1038/s41598-024-54655-z

Erratum in

Author Correction: AI is a viable alternative to high throughput screening: a 318-target study.
Atomwise AIMS Program. Atomwise AIMS Program. Sci Rep. 2024 Sep 16;14(1):21579. doi: 10.1038/s41598-024-70321-w. Sci Rep. 2024. PMID: 39284864 Free PMC article. No abstract available.

Abstract

High throughput screening (HTS) is routinely used to identify bioactive small molecules. This requires physical compounds, which limits coverage of accessible chemical space. Computational approaches combined with vast on-demand chemical libraries can access far greater chemical space, provided that the predictive accuracy is sufficient to identify useful molecules. Through the largest and most diverse virtual HTS campaign reported to date, comprising 318 individual projects, we demonstrate that our AtomNet® convolutional neural network successfully finds novel hits across every major therapeutic area and protein class. We address historical limitations of computational screening by demonstrating success for target proteins without known binders, high-quality X-ray crystal structures, or manual cherry-picking of compounds. We show that the molecules selected by the AtomNet® model are novel drug-like scaffolds rather than minor modifications to known bioactive compounds. Our empirical results suggest that computational methods can substantially replace HTS as the first step of small-molecule drug discovery.

PubMed Disclaimer

Conflict of interest statement

The authors affiliated with Atomwise declare the existence of a financial competing interest.

Figures

**Figure 1**
Pairs of representative compounds extracted from AI patents (right) and corresponding prior patents (left) for clinical-stage programs (CDK7^,, A2Ar-antagonist^,, MALT1^,, QPCTL^,, USP1^,, and 3CLpro^,). The identical atoms between the chemical structures are highlighted in red.

**Figure 2**
The distributions of 296 AIMS projects across assay types used in the primary screen, research areas, target classes, and further breakdown to enzyme classes when applicable.

**Figure 3**
(A) An illustration of the hit rate versus the number of training examples available to our model. Each point represents a project, with the x-axis denoting the number of active molecules in our training for the target protein or homologs and the y-axis denoting the hit rate of the project (the percentage of molecules tested in the project that were active). The model shows no dependence on the availability of on-target training examples. For 70% of the targets, the AtomNet model training data lacked any active molecules for that target or any similar targets with greater than 70% sequence identity, yet the model achieved a hit rate of 5.3% compared to 6.1% when on-target data was available. (B) The distribution of similarities between hits and their most-similar bioactive compounds in our training data. Our screening protocol ensures that the compounds subjected to physical testing are not similar to known active compounds or close homologs (< 0.5 Tanimoto similarity using ECFP4, 1024 bits). Because 70% of the AIMS targets had no annotated bioactivities in our training dataset, hits identified in these projects have a similarity value of zero.

**Figure 4**
Hit rates obtained for the 296 AIMS projects. (A) A comparison of hit rates using X-ray crystallography, NMR, Cryo-EM, and homology for modeling the structure of the proteins. Each point represents a project with the x-axis denoting the hit rate of the project (the percentage of molecules tested in the project that were active). The number of projects of each type is given in parentheses. We observed no substantial difference in success rate between the physical and the computationally inferred models. We achieved average hit rates of 5.6%, 5.5%, and 5.1% for crystal structures, cryo-EM, and homology modeling, respectively. The number of projects using NMR structures is too small to make statistically-robust claims. (B) A comparison of hit rates observed for traditionally challenging target classes such as protein–protein interactions (PPI) and allosteric binding. Of the 296 projects, 72 targeted PPIs and 58 allosteric binding sites. The average hit rates were 6.4% and 5.8% for PPIs and allosteric binding, respectively. (C) Comparison of hit rates observed for different target classes and (D) enzyme classes. No protein or enzyme class falls outside the domain of applicability of the algorithm.

See this image and copyright information in PMC

References

1. Kuntz, I. D. Structure-based strategies for drug design and discovery. Science257, 1078–1082 (1992). - PubMed
1. Bajorath, J. Integration of virtual and high-throughput screening. Nat. Rev. Drug Discov.1, 882–894 (2002). - PubMed
1. Walters, W. P., Stahl, M. T. & Murcko, M. A. Virtual screening—an overview. Drug Discov. Today3, 160–178 (1998).
1. Ring, C. S. et al. Structure-based inhibitor design by using protein models for the development of antiparasitic agents. Proc. Natl. Acad. Sci. USA.90, 3583–3587 (1993). - PMC - PubMed
1. Brown, D. G. An analysis of successful hit-to-clinical candidate pairs. J. Med. Chem.10.1021/acs.jmedchem.3c00521 (2023). - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

AI is a viable alternative to high throughput screening: a 318-target study

AI is a viable alternative to high throughput screening: a 318-target study

Erratum in

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources