Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 27;15(1):2688.
doi: 10.1038/s41467-024-47011-2.

3D molecular generative framework for interaction-guided drug design

Affiliations

3D molecular generative framework for interaction-guided drug design

Wonho Zhung et al. Nat Commun. .

Abstract

Deep generative modeling has a strong potential to accelerate drug design. However, existing generative models often face challenges in generalization due to limited data, leading to less innovative designs with often unfavorable interactions for unseen target proteins. To address these issues, we propose an interaction-aware 3D molecular generative framework that enables interaction-guided drug design inside target binding pockets. By leveraging universal patterns of protein-ligand interactions as prior knowledge, our model can achieve high generalizability with limited experimental data. Its performance has been comprehensively assessed by analyzing generated ligands for unseen targets in terms of binding pose stability, affinity, geometric patterns, diversity, and novelty. Moreover, the effective design of potential mutant-selective inhibitors demonstrates the applicability of our approach to structure-based drug design.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. A conceptualized illustration of our proposed interaction-aware 3D ligand generative framework.
a The first stage profiles a protein pocket to designate an interaction condition on each protein atom. b In the second stage, DeepICL sequentially adds ligand atoms inside a protein pocket based on the predetermined interaction condition. Letters inside circles indicate interaction types as follows: hydrogen bonds (H), hydrophobic interactions (D), salt bridges (S), and ππ stackings (π).
Fig. 2
Fig. 2. Examples of interaction-aware conditioned ligand elaboration.
a Initial core structures and binding pocket surfaces marked with given interaction conditions. Although an interaction condition is defined at an atom level, we illustrate conditions as patches for a better visual representation. b The original and designed ligands of the highest interaction similarities with the respective similarity value. c, d The 2D diagrams of profiled interactions between the pocket and the original ligands or designed ligands. The circles indicate amino acid residues, and the dashed lines indicate the interactions. Different colors are used to distinguish interaction types, where circles with multiple colors correspond to the residues involved in more than one type of interaction. The core structures used as an initial structure are highlighted in each ligand. e The distributions of interaction similarities of ligands generated with the reference and masked condition. Source data are provided as a Source Data file. Left: bone morphogenic protein 1 (BMP1, PDB ID: 6bto), middle: fibroblast growth factor 1 (FGF1, PDB ID: 3ud9), right: dihydrofolate reductase (DHFR, PDB ID: 1dis).
Fig. 3
Fig. 3. Demonstration of the generalizability of our generative framework.
a Plots of ligand RMSDs during short MD simulations to assess the binding pose stability of designed ligands in three pockets from the test set—BMP1, FGF1, and DHFR. The blue and red curves depict the averaged RMSDs of ten sampled ligands of each generated set with 95% confidence intervals. Gray curves show ligand RMSDs of the original ligands. b The binding affinity scores of each set are presented as box plots (center line at the median, upper bound at 75th percentile, lower bound at 25th percentile) with whiskers at the minimum and maximum values. The average scores are also shown as diamonds. 100 ligands were generated with and without interaction information for each of the 100 test pockets, resulting in a total of 10,000 ligands. Their binding affinity scores were depicted in the blue and red boxes, respectively. The binding affinity scores of the ground-truth complexes composing the training and test sets are also analyzed, and depicted as the black and white boxes, respectively. Note that the training and test complexes are carefully separated, thus the training and test ligands are from distinct protein targets. c The bar plot of the number of interactions per molecule for each interaction type from the generated complexes in (b). d, e Kernel density estimation plots of hydrophobic interaction and hydrogen bonding distances, respectively. The distances from the generated complexes in (b) were measured by using the PLIP software. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Selectively controlled ligand design via site-specific interaction conditioning.
a The scatter plot illustrating the binding affinity scores of designed ligands toward the wild-type EGFR and the double-mutated EGFR, with their population density. Red points show 2.72 kcal/mol lower binding affinity (corresponding to theoretically 100-fold less inhibitory concentration) for the mutated EGFR. Source data are provided as a Source Data file. b An example of a well-designed ligand that is expected to be selective toward the double-mutated EGFR (depicted as a star in (a)). Non-covalent interactions and their distances (unit: Å) with mutated residues are shown in yellow dashed lines.
Fig. 5
Fig. 5. Illustration of the model architecture of DeepICL.
a The training phase of DeepICL, where two losses reg and recon are denoted. b In the generation phase of DeepICL, z is sampled from the standard normal distribution instead of using the encoder. c The encoder module (qϕ) is trained to encode a whole protein–ligand complex (L, P) and corresponding interaction condition, I, into a latent vector z that follows a prior distribution. d The decoder module (pθ) is trained to reconstruct the ligand structure from the given protein pocket and an interaction condition with an autoregressive process. Note that the decoder of the figure describes a single atom addition step, where a type and a position of the tth ligand atom are determined from the protein–ligand complex of step t−1. e The embedding module is included in front of the encoder and decoder, incorporates interaction conditions to protein atoms, and updates protein and ligand atom features via interaction layers.

References

    1. Muralidhar, N., Islam, M., Marwah, M., Karpatne, A. & Ramakrishnan, N., Incorporating prior domain knowledge into deep neural networks. In: 2018 IEEE International Conference On Big Data (big Data) 36–45 (IEEE, 2018).
    1. Dash T, Chitlangia S, Ahuja A, Srinivasan A. A review of some techniques for inclusion of domain-knowledge into deep neural networks. Sci. Rep. 2022;12:1040. doi: 10.1038/s41598-021-04590-0. - DOI - PMC - PubMed
    1. Yu Y, et al. Techniques and challenges of image segmentation: a review. Electronics. 2023;12:1199. doi: 10.3390/electronics12051199. - DOI
    1. Culos A, et al. Integration of mechanistic immunological knowledge into a machine learning pipeline improves predictions. Nat, Mach. Intell. 2020;2:619–628. doi: 10.1038/s42256-020-00232-8. - DOI - PMC - PubMed
    1. Kirkpatrick J, et al. Pushing the frontiers of density functionals by solving the fractional electron problem. Science. 2021;374:1385–1389. doi: 10.1126/science.abj6511. - DOI - PubMed