Review

. 2022 Nov;27(11):103351.

doi: 10.1016/j.drudis.2022.103351. Epub 2022 Sep 9.

Combining DELs and machine learning for toxicology prediction

Vincent Blay¹, Xiaoyu Li², Jacob Gerlach³, Fabio Urbina³, Sean Ekins⁴

Affiliations

¹ Department of Microbiology and Environmental Toxicology, University of California at Santa Cruz, Santa Cruz, CA 95064, USA. Electronic address: vroger@ucsc.edu.
² Department of Chemistry and State Key Laboratory of Synthetic Chemistry, The University of Hong Kong, Hong Kong Special Administrative Region.
³ Collaborations Pharmaceuticals, Inc, 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA.
⁴ Collaborations Pharmaceuticals, Inc, 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA. Electronic address: sean@collaborationspharma.com.

PMID: 36096360
PMCID: PMC9995617
DOI: 10.1016/j.drudis.2022.103351

Review

Combining DELs and machine learning for toxicology prediction

Vincent Blay et al. Drug Discov Today. 2022 Nov.

. 2022 Nov;27(11):103351.

doi: 10.1016/j.drudis.2022.103351. Epub 2022 Sep 9.

Authors

Vincent Blay¹, Xiaoyu Li², Jacob Gerlach³, Fabio Urbina³, Sean Ekins⁴

Affiliations

¹ Department of Microbiology and Environmental Toxicology, University of California at Santa Cruz, Santa Cruz, CA 95064, USA. Electronic address: vroger@ucsc.edu.
² Department of Chemistry and State Key Laboratory of Synthetic Chemistry, The University of Hong Kong, Hong Kong Special Administrative Region.
³ Collaborations Pharmaceuticals, Inc, 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA.
⁴ Collaborations Pharmaceuticals, Inc, 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA. Electronic address: sean@collaborationspharma.com.

PMID: 36096360
PMCID: PMC9995617
DOI: 10.1016/j.drudis.2022.103351

Abstract

DNA-encoded libraries (DELs) allow starting chemical matter to be identified in drug discovery. The volume of experimental data generated also makes DELs an attractive resource for machine learning (ML). ML allows modeling complex relationships between compounds and numerical endpoints, such as the binding to a target measured by DELs. DELs could also empower other areas of drug discovery. Here, we propose that DELs and ML could be combined to model binding to off-targets, enabling better predictive toxicology. With enough data, ML models can make accurate predictions across a vast chemical space, and they can be reused and expanded across projects. Although there are limitations, more general toxicology models could be applied earlier during drug discovery, illuminating safety liabilities at a lower cost.

Keywords: Cheminformatics; DNA-encoded libraries; Deep learning toxicology safety pharmacology; Machine learning.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: S.E. is the owner, and J.G. and F.U. are employees, of Collaborations Pharmaceuticals, Inc.

Figures

**Figure 1.**
Multi-task ML is particularly well-suited to model multiple toxicology endpoints. A ConvLSTM model was built to predict 42 toxicity endpoints (IC₅₀ values) based on *in vitro* data. (A) Estimated time for fingerprint calculation for one billion molecules using rdkit’s GetMorganFingerprintAsBitVector function compared with the custom SMILES tokenization pre-processing used in our ConvLSTM model. (B) Parity plot for 42 mixed targets, showing predicted vs. actual -log(Molar) values for a test-set of compounds, with at least 10 datapoints for each of the 42 toxicity targets (see Table S1). RMSE and R² shown for the combined predictions. (C) RMSE vs. size of the training data for each target. The highest RMSE (red, CCK1) and lowest RMSE (Serotonin 5HT1B, blue) are highlighted, along with the toxicity target with the largest training set (hERG, pink). (D) t-SNE plot of chemistry space (input: Morgan Fingerprints of radius 3, 1024 bits) showing overlap of the DOS-DEL-1 and combined training sets of the 42 tox targets used to build the model.

**Figure 2.**
Combining DELs and ML may provide novel endpoints for predictive toxicology. For instance, the compression of multiple targets in the same DEL could enable a cost-efficient screening for promiscuity. Native proteins can be extracted from human organs, tissues, or cells. These proteins can be used in-solution (A) or be immobilized (B) for DEL selections. (A) The protein extract may be incubated with the DEL. After incubation, a chemically reactive DNA probe (capture probe; a photo-crosslinker diazirine is shown as an example), which is complementary to the common primer-binding site of the library, is added. UV irradiation then triggers the covalent capture of the target and a primer extension step copies the DNA code . The protein-DNA conjugates may be purified by protein extraction or using a built-in biotin group in the capture probe, followed by PCR amplification and DNA sequencing. (B) The proteins are immobilized on beads, and the protein-coated beads are incubated with the DEL. After careful washing of non-binders, bound library members are eluted, PCR-amplified, and sequenced. In both formats, the DNA sequences confidently map to the chemical identity of the compounds. Given the diversity of proteins, the bound compounds identified from the DEL likely represent promiscuous binders for that specific protein mix (see Figure 4). The large amounts of data generated from DELs can be modeled using ML. Different endpoints (e.g., promiscuity across different targets, cell types or tissues) can be modeled simultaneously in a multi-task ML model. The model can then be used to inspect large chemical libraries and remove or filter compounds with potential safety liabilities early in the drug discovery pipeline.

**Figure 3.**
DELs might provide novel opportunities for generating data and modeling key ADMET endpoints, such as compound promiscuity and cell permeability. DELs could be incubated with liposomes. The DNA tag may be a significant cargo, and this may allow identifying cell-penetrating compounds from the liposomes (left). Alternatively, a hydrophobic linker might be used to reduce the impact of the DNA tag on the permeability of the small-molecule head (right). This could allow a better assessment of permeability by measuring what fraction of each library member is retained inside the liposomes or on the membrane.

**Figure 4.**
Relationships between compound recovery, target concentration ([P]_total), individual ligand concentration ([L]_total), and binding affinity (K_d) in DEL selections. [P]_total, [L]_total, and K_d have the same units and are displayed in logarithmic concentration units. The simulation considers the recovery achieved after a single ideal equilibration step, with a simple association-dissociation equilibrium (see Supporting Information for details). Since a DEL assay involves washing steps, we only expect compounds with high recoveries (>90%) to be identified as positives. The simulation indicates that the total protein concentration should be set considerably higher than that of the individual ligands to achieve a high recovery of tight binders (A, C). The results also indicate that, if multiple ligands can compete for the same active site, the total target concentration should be higher than the sum of all of them to enable a high recovery. In other words, the protein concentration affects the stringency of the recovery, such that the lower the protein concentration, the higher the binding affinity of the compound will have to be for it to be positively observed in the DEL (B). As a first approximation, only compounds with K_d ≪ [P]_total are expected to be recovered.

See this image and copyright information in PMC

References

1. Wouters OJ, McKee M, Luyten J. Estimated Research and Development Investment Needed to Bring a New Medicine to Market, 2009-2018. JAMA. 2020;323(9):844–853. doi: 10.1001/jama.2020.1166 - DOI - PMC - PubMed
1. Waring MJ, Arrowsmith J, Leach AR, et al. An analysis of the attrition of drug candidates from four major pharmaceutical companies. Nat Rev Drug Discov. 2015;14(7):475–486. doi: 10.1038/nrd4609 - DOI - PubMed
1. Rao MS, Gupta R, Liguori MJ, et al. Novel Computational Approach to Predict Off-Target Interactions for Small Molecules. Frontiers in Big Data. 2019;2. Accessed April 4, 2022. https://www.frontiersin.org/article/10.3389/fdata.2019.00025 - DOI - PMC - PubMed
1. Avila AM, Bebenek I, Bonzo JA, et al. An FDA/CDER perspective on nonclinical testing strategies: Classical toxicology approaches and new approach methodologies (NAMs). Regul Toxicol Pharmacol. 2020;114:104662. doi: 10.1016/j.yrtph.2020.104662 - DOI - PubMed
1. Bender A, Cortés-Ciriano I. Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 1: Ways to make an impact, and why we are not there yet. Drug Discovery Today. 2021;26(2):511–524. doi: 10.1016/j.drudis.2020.12.009 - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Combining DELs and machine learning for toxicology prediction

Affiliations

Combining DELs and machine learning for toxicology prediction

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources