Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 6;16(1):2269.
doi: 10.1038/s41467-025-57485-3.

Knowledge-guided diffusion model for 3D ligand-pharmacophore mapping

Affiliations

Knowledge-guided diffusion model for 3D ligand-pharmacophore mapping

Jun-Lin Yu et al. Nat Commun. .

Abstract

Pharmacophores are abstractions of essential chemical interaction patterns, holding an irreplaceable position in drug discovery. Despite the availability of many pharmacophore tools, the adoption of deep learning for pharmacophore-guided drug discovery remains relatively rare. We herein propose a knowledge-guided diffusion framework for 'on-the-fly' 3D ligand-pharmacophore mapping, named DiffPhore. It leverages ligand-pharmacophore matching knowledge to guide ligand conformation generation, meanwhile utilizing calibrated sampling to mitigate the exposure bias of the iterative conformation search process. By training on two self-established datasets of 3D ligand-pharmacophore pairs, DiffPhore achieves state-of-the-art performance in predicting ligand binding conformations, surpassing traditional pharmacophore tools and several advanced docking methods. It also manifests superior virtual screening power for lead discovery and target fishing. Using DiffPhore, we successfully identify structurally distinct inhibitors for human glutaminyl cyclases, and their binding modes are further validated through co-crystallographic analysis. We believe this work will advance the AI-enabled pharmacophore-guided drug discovery techniques.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. The datasets of 3D ligand-pharmacophore pairs.
a The construction protocol for LigPhoreSet (see details in Methods section). b The t-SNE plots of ligands’ ECFP4 (1024-bit) fingerprints and pharmacophore counts reveal that LigPhoreSet covers wider chemical and pharmacophoric spaces compared with CpxPhoreSet. The ECFP4 fingerprints were processed by PCA (with random_state=2024 and n_component = 50) for dimensionality reduction before the t-SNE analysis. The t-SNE analysis was performed with the following hyperparameters: n_component = 2, perplexity = 30, n_iter = 5000, random_state = 2024. c LigPhoreSet shares similar occurrence frequency of pharmacophore feature with CpxPhoreSet. d Distribution of the fitness scores (i.e., DfScore1; see Methods) of ligand-pharmacophore pairs from CpxPhoreSet (n = 15,012) and LigPhoreSet (n = 840,288). The boxes represent data distribution with center lines showing medians, box limits indicating the 25th and 75th percentiles, and whiskers extending to 1.5 times the interquartile range from the lower and upper quartiles. Source data are provided as a Source Data file.
Fig. 2
Fig. 2. The framework of DiffPhore.
a DiffPhore adopts the diffusion-denoising process to predict binding conformations mapping with pharmacophore from randomly initialized conformations. b DiffPhore incorporates knowledge-guided pharmacophore mapping rules for conformation generation. LPM representation encoder uses a geometric heterogenous graph Gt, including a fully-connected bipartite graph Glp to represent LPM, where Vlp and Nlp are introduced to deliver type and direction matching information for ligand conformation update. c The calibrated conformation sampler randomly takes pseudo conformations (i.e., from intermediate prediction) as inputs for learning the conformation denoising process. The probability to select pseudo conformations is controlled by an annealing temperature Pepoch.
Fig. 3
Fig. 3. The performance of DiffPhore on ligand binding conformation prediction.
Plots of cumulative distribution describing the proportion of observations falling below each RMSD value by different methods on (a) the PDBBind test set and (b) PoseBusters set. c The top-1 success rates for different methods evaluated on the full set (all proteins) or new protein subset (new proteins, not included in the training set) of PDBBind test set and PoseBusters set. Cumulative distribution plots describing the proportion of observations below each energy ratio value for different methods on (d) the PDBBind test set and (e) PoseBusters set. The energy ratio is calculated as ratio=Epred/Etrue, where Epred and Etrue represent the UFF force field energies (from PoseBusters validity test) of the predicted and ground truth poses, respectively. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. The impact of ligand flexibility and pharmacophore complexity on the predictive accuracy of DiffPhore.
Plots of the Top-1 RMSD values (upper) or success rates (lower) versus the number of heavy atoms (a), rotatable bonds (b), and pharmacophore features (c) reveal the impacts of ligand flexibility and pharmacophore complexity on the conformation prediction performance of DiffPhore and AncPhore. The numbers in parentheses represent the number of initial conformations; top-1 success rate means generating conformations with RMSD < 2 Å. Data are presented as mean values ±95% confidence interval. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. The DiffPhore screening power for lead discovery and target fishing.
Comparison of different DiffPhore scorings with other methods in virtual screening for lead discovery, evaluated respectively on (ac) non-overlapping and (df) overlapping targets using the metrics AUROC, BEDROC, and EF0.5%. Boxes are ranked based on their mean values, indicated by triangle markers. The “*” symbol denotes a statistically significant difference (unpaired two-sided student’s t-tests, p-value < 0.05, n = 14) between the baseline and DiffPhore (DfScore1). Exact p values are provided in the Source Data file. The boxes represent data distribution with center lines showing medians, box limits indicating the 25th and 75th percentiles, and whiskers extending to 1.5 times the interquartile range from the lower and upper quartiles. AUROC area under the receiver operating characteristic curve, BEDROC Boltzmann-enhanced discrimination of receiver operating characteristic, EF0.5% enrichment factor at 0.5%. g Comparison of DiffPhore with other baselines in predicting the 12 targets of 4OH-tamoxifen. Percent rank = (rank order/number of total complex structures in IFPTarget) × 100. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. The sQC/gQC inhibitors identified by DiffPhore.
a The chemical structures of compounds 5 and 13, along with their predicted conformations mapping to the pharmacophore model derived from the binding mode of QFA with sQC; the IC50 curves of the two inhibitors of sQC/gQC (all determinations are tested in triplicate; data are presented as mean values ±SEM); the melting curves (first-derivative of dissociation) of sQC (yellow) and gQC (cyan) in the presence or absence of 5 (50 μM) or 13 (50 μM). Views from the (b) sQC:5 (PDB code 9ISD) and (c) sQC:13 (PDB code 9IVV) complex structures, revealing the modes of 5 and 13 inhibiting sQC; the mFo-DFc electron density (OMIT maps, blue mesh, contoured at 3.0σ) around 5 and 13 are calculated from the last refinement models. Superimpositions of (d) sQC:5 and (e) sQC:13, respectively, with sQC:QFA analog (PDB code 6YI1), reveal that 5 and 13 have a similar mode as that of QFA analog with sQC.

References

    1. Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature620, 47–60 (2023). - PubMed
    1. Schneider, P. et al. Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug Discov.19, 353–364 (2020). - PubMed
    1. Mullowney, M. W. et al. Artificial intelligence for natural product drug discovery. Nat. Rev. Drug Discov.22, 895–916 (2023). - PubMed
    1. Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature618, 616–624 (2023). - PMC - PubMed
    1. Ren, F. et al. A small-molecule TNIK inhibitor targets fibrosis in preclinical and clinical models. Nat. Biotechnol. 43, 63–75 (2024). - PMC - PubMed

LinkOut - more resources