Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Oct 2:2025.09.30.679600.
doi: 10.1101/2025.09.30.679600.

Expansion of DNA-Encoded Library Hits Using Generative Chemistry and Ultra-Large Compound Catalogs

Affiliations

Expansion of DNA-Encoded Library Hits Using Generative Chemistry and Ultra-Large Compound Catalogs

Brandon Novy et al. bioRxiv. .

Abstract

DNA-encoded libraries (DELs) are powerful tools for initial hit identification, yet the combinatorial chemistries and building block choices used in their construction can restrict chemical space coverage and hit drug-likeness, limiting efficient hit expansion. Generative artificial intelligence (AI), by contrast, can in principle explore drug-like chemical space around any given compound, but it often struggles with the synthesizability of generated molecules and requires a set of validated hits to initiate exploration. Here, we present a synergistic methodology that overcomes these mutual limitations by leveraging experimentally validated DEL data to initialize and bias an AI-powered virtual screening pipeline, expanding initial DEL hits with both de novo and purchasable compounds from ultra-large chemical libraries. Using this approach, we identified novel, commercially available hits from the Enamine REAL Space for the chromatin reader protein 53BP1 and validated them in a time-resolved fluorescence resonance energy transfer (TR-FRET) displacement assay. Three compounds demonstrated TR-FRET IC50 values ≤50 μM, while 11 exhibited IC50 values ≤100 μM. Critically, the AI-nominated hits exhibited greater chemical diversity, improved drug-likeness, and were readily purchasable off-the-shelf compared to compounds from the initial DEL selection. This work demonstrates a streamlined platform in which empirical DEL data and generative chemistry models are combined to enable rapid hit expansion from initially screened libraries into diverse, commercially available chemical matter.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1.
Figure 1.. Workflow for the 53BP1 hit-finding campaign.
Screening data from UNCDEL003 was used to seed two iterative cycles of the HIDDEN GEM generative chemistry model, producing purchasable nominations from the Enamine REAL Space.
Figure 2.
Figure 2.. DEL-Based Compounds Exhibit Comparable Activity but Inferior Drug-Like Properties.
A) Previously reported UNC8531, identified through extensive mono and disynthon aggregation from DEL screening, compared with the top active compound from enrichment-based nominations in the UNCDEL003 screen against 53BP1-TTD. B) Drug-like properties of closely related bioisosteres generated during Cycle 1 and Cycle 2 of HIDDEN GEM. A comparison of potency, drug-likeness (QED), Tanimoto similarity, and SA scores of UNC8531, top UNCDEL003 compound, and select purchased Enamine compounds are shown. C) Distributions of physicochemical properties for all nominations from each generative docking cycle, highlighting statistically significant shifts (two-tailed t-test) in molecular weight, rotatable bonds, and heteroatom count.
Figure 3.
Figure 3.. Predicted binding pose and key interactions of UNC10413788A, a top hit from the generative chemistry model.
A) The top docking pose of UNC10413788A to 53BP1 structure, shown with UNC8531 crystal pose. The predicted binding pose of UNC10413788A shows interactions similar to those observed with DEL-derived hits, while introducing distinct chemical diversity relative to previous nominations. The compound exhibited a TR-FRET IC50 of 22 ± 3 μM, comparable to the top hits from both DEL and Enamine libraries. B) Detailed view of UNC10413788A and UNC851 with pocket residues D1521, Y1523, and M1584. C) Ligand interaction diagram of UNC10413788A generated using Schrödinger Maestro. The interaction map highlights a conserved methyl-piperazine forming a network of π-cation interactions. Green residues indicate hydrophobic contacts, red residues denote charged amino acids, and red lines represent electrostatic interactions.
Figure 4.
Figure 4.. Chemical space mapping reveals that HIDDEN GEM explores novel regions of chemical space.
t-SNE analysis was performed using Morgan fingerprints (radius=3, 2048 bits). Hits from each method were compared to the previously reported DEL003 reference compound, UNC8531. HIDDEN GEM Cycles 1 and 2 exhibited the greatest chemical diversity, highlighting the model’s ability to access distinct regions of chemical space (Figure 5, Figure S1).
Figure 5.
Figure 5.. Validation of top hits nominated by HIDDEN GEM.
A) Chemical structures of top hits from UNCDEL003 vs HIDDEN GEM Cycle 1 and 2. B) Tanimoto similarity analysis using Morgan fingerprints (radius=3, 2048 bits) shows that hits from the generative model are more chemically diverse relative to the reference UNCDEL003 compound. QED analysis indicates that Cycle 1 of HIDDEN GEM produces hits with superior drug-like properties compared to the top enriched UNCDEL003 compounds, while Cycle 2 further improves drug-likeness without compromising activity.

References

    1. Pereira D. A. & Williams J. A. Origin and evolution of high throughput screening. Br. J. Pharmacol. 152, 53–61 (2007). - PMC - PubMed
    1. Macarrón R. & Hertzberg R. P. Design and implementation of high throughput screening assays. Mol. Biotechnol. 47, 270–285 (2011). - PubMed
    1. DiMasi J. A., Grabowski H. G. & Hansen R. W. Innovation in the pharmaceutical industry: New estimates of R&D costs. J. Health Econ. 47, 20–33 (2016). - PubMed
    1. Liu R., Li X. & Lam K. S. Combinatorial Chemistry in Drug Discovery. Curr. Opin. Chem. Biol. 38, 117–126 (2017). - PMC - PubMed
    1. Yuen L. H. et al. A Focused DNA-Encoded Chemical Library for the Discovery of Inhibitors of NAD+-Dependent Enzymes. J. Am. Chem. Soc. 141, 5169–5181 (2019). - PubMed

Publication types

LinkOut - more resources