Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun 23;9(25):eadg7865.
doi: 10.1126/sciadv.adg7865. Epub 2023 Jun 21.

Accelerating drug target inhibitor discovery with a deep generative foundation model

Affiliations

Accelerating drug target inhibitor discovery with a deep generative foundation model

Vijil Chenthamarakshan et al. Sci Adv. .

Abstract

Inhibitor discovery for emerging drug-target proteins is challenging, especially when target structure or active molecules are unknown. Here, we experimentally validate the broad utility of a deep generative framework trained at-scale on protein sequences, small molecules, and their mutual interactions-unbiased toward any specific target. We performed a protein sequence-conditioned sampling on the generative foundation model to design small-molecule inhibitors for two dissimilar targets: the spike protein receptor-binding domain (RBD) and the main protease from SARS-CoV-2. Despite using only the target sequence information during the model inference, micromolar-level inhibition was observed in vitro for two candidates out of four synthesized for each target. The most potent spike RBD inhibitor exhibited activity against several variants in live virus neutralization assays. These results establish that a single, broadly deployable generative foundation model for accelerated inhibitor discovery is effective and efficient, even in the absence of target structure or binder information.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.. Overview of our inhibitor discovery workflow driven by CogMol, a sequence-guided deep generative foundation model.
(A and B) Molecular Variational AutoEncoder (VAE) training on large-scale chemical SMILES (x) data and mapping of existing protein-ligand affinity relations on the VAE latent space (z) by training a binding predictor, respectively. For the latter, we leverage pretrained neural network (NN) embeddings of a large volume of protein sequences. (C) Schematic representation of Controllable Latent (attribute) Space Sampling (CLaSS), which samples from the model of VAE latent vectors by using the guidance from a set of molecular property predictors (e.g., protein binding) such that, for a given target protein sequence, sampled z vectors corresponding to strong target binding affinity are accepted, while vectors corresponding to weak target binding affinity are rejected. The accepted z vectors are then decoded into molecular SMILES. (D) Candidates are then ranked and filtered according to chemical properties, docking score to target structure, and predicted retrosynthetic feasibility and toxicity. (E) A small set of prioritized molecules are synthesized, followed by wet laboratory testing in specific in vitro assays to confirm target inhibition. (F) In the present case, for each target, of the four molecules tested, two showed promising levels of inhibition. The hit discovery rate reported is therefore the fraction of the AI-designed compounds that were synthesized and experimentally tested, which showed inhibition in target-specific assays. We also report approximate sample sizes and timeline for each stage of our discovery workflow. Note the timeline does not include the training and testing of the generative and predictive machine learning models.
Fig. 2.
Fig. 2.. De novo designed and commercially sourced molecules.
(A) Molecules with the prefix “Z” are molecules from the Enamine Advanced Collection catalog targeting Mpro. Molecules with the prefix “GEN” are generated candidates targeting the spike RBD (B), while those with the prefix “GXA” are generated candidates targeting Mpro (C).
Fig. 3.
Fig. 3.. SARS-CoV-2 spike neutralization assays.
Neutralization assay against SARS-CoV-2 pseudotyped lentivirus (A) and Victoria live virus (B) for four CogMol-generated compounds with DMSO as a control. (C) The most effective compound, GEN727, was selected for a pseudoviral neutralization assay against Victoria, Alpha, Beta, Gamma, Delta, and Omicron variants of concern (VOCs), as well as (D) the live virus neutralization assay. Error bars show the standard error of each measurement over two trials.
Fig. 4.
Fig. 4.. Docked structure of SARS-CoV-2 spike protein RBD in complex with GEN727.
(A) Ribbon representation with transparent surface of the spike trimer. Wheat, gray, and light pink color is used to delineate each protomer. GEN727 (shown in stick representation) docked to a spike monomer structure is superimposed for reference. (B) Surface representation depicting the overall docking pose of GEN727 at the lipid binding site of the spike RBD. (C) Schematic of GEN727 interacting with the RBD. (D) Docked GEN727 (cyan) in reference to stearic acid lipid (magenta) bound to the spike RBD. (E) Stearic acid binding pocket. Stearic acid (shown as sticks, almost completely buried) is distant from the sites of binding of most neutralizing antibodies, which attach much higher up the molecule, overlapping the site of attachment of ACE2 (the green surface) and thereby blocking attachment to the host cell.
Fig. 5.
Fig. 5.. Model of GEN727 in the lipid binding pocket of SARS CoV-2 RBD.
(A) Snapshot from MD simulation at the end of 1 μs. (B) Plot of protein-ligand distance [between the center of mass of GEN727 (shown in cyan/blue) and the center of mass of the lipid binding pocket, heavy atom only, in nm], as a function of simulation time (in ps). The lipid binding pocket is defined by five Phe residues, Phe338, Phe342, Phe374, Phe377, and Phe392.
Fig. 6.
Fig. 6.. Inhibition of SARS-CoV-2 Mpro by machine-designed de novo and commercially sourced compounds.
(A) Half-maximal inhibitory concentration (IC50) from RapidFire MS experiments for de novo and commercial Mpro inhibitor candidates. Symbol “—” indicates that no inhibition was detected. Candidates marked with had successful crystal structures determined. (B to D) Crystal structure of the SARS-CoV-2 Mpro in complex with Z68337194. (B) Ribbon representation with transparent surface of the Mpro dimer colored in wheat and light pink to delineate each protomer. The active site of each protomer is shown with Z68337194 in stick representation. (C) Surface representation showing the overall binding mode of Z68337194 at the active site of Mpro. (D) Schematic representation of the interactions of Z68337194 with Mpro. Residues indicated with * are from a symmetry-related Mpro protomer.
Fig. 7.
Fig. 7.. Docked structures of SARS-CoV-2 Mpro with GXA112 and GXA70.
Surface representation depicting the overall ligand binding modes of (A) GXA112 and (C) GXA70 at the active site of Mpro. Schematic representation of the ligand interactions with Mpro for (B) GXA112 and (D) GXA70.
Fig. 8.
Fig. 8.. Molecular similarity with PubChem compounds.
Top: Validated de novo compounds targeting (A) Mpro and (B) spike RBD. Bottom: Most similar molecules from PubChem. Values in parenthesis indicate Tanimoto similarity between the machine-designed and nearest PubChem molecules.

References

    1. M. D. Lloyd, High-throughput screening for the discovery of enzyme inhibitors. J. Med. Chem. 63, 10742–10772 (2020). - PubMed
    1. P. G. Polishchuk, T. I. Madzhidov, A. Varnek, Estimation of the size of drug-like chemical space based on GDB-17 data. J. Comput. Aided Mol. Des. 27, 675–679 (2013). - PubMed
    1. J. A. DiMasi, H. G. Grabowski, R. W. Hansen, Innovation in the pharmaceutical industry: New estimates of R&D costs. J. Health Econ. 47, 20–33 (2016). - PubMed
    1. A. Zunger, Inverse design in search of materials with target functionalities. Nat. Rev. Chem. 2, 1–16 (2018).
    1. T. Sousa, J. Correia, V. Pereira, M. Rocha, Generative deep learning for targeted compound design. J. Chem. Inf. Model. 61, 5343–5361 (2021). - PubMed

Substances