Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 19;12(1):8434.
doi: 10.1038/s41598-022-12180-x.

Predicting target-ligand interactions with graph convolutional networks for interpretable pharmaceutical discovery

Affiliations

Predicting target-ligand interactions with graph convolutional networks for interpretable pharmaceutical discovery

Paola Ruiz Puentes et al. Sci Rep. .

Abstract

Drug Discovery is an active research area that demands great investments and generates low returns due to its inherent complexity and great costs. To identify potential therapeutic candidates more effectively, we propose protein-ligand with adversarial augmentations network (PLA-Net), a deep learning-based approach to predict target-ligand interactions. PLA-Net consists of a two-module deep graph convolutional network that considers ligands' and targets' most relevant chemical information, successfully combining them to find their binding capability. Moreover, we generate adversarial data augmentations that preserve relevant biological backgrounds and improve the interpretability of our model, highlighting the relevant substructures of the ligands reported to interact with the protein targets. Our experiments demonstrate that the joint ligand-target information and the adversarial augmentations significantly increase the interaction prediction performance. PLA-Net achieves 86.52% in mean average precision for 102 target proteins with perfect performance for 30 of them, in a curated version of actives as decoys dataset. Lastly, we accurately predict pharmacologically-relevant molecules when screening the ligands of ChEMBL and drug repurposing Hub datasets with the perfect-scoring targets.

PubMed Disclaimer

Figures

Figure 1
Figure 1
PLA-Net workflow. Schematic representation of a PLA-Net model for predicting interactions between small organic molecules and one of the 102 target proteins in the AD dataset. Graph representations of the molecule and a given target protein are generated from SMILES and FASTA sequences and are used as input to the Ligand Module (LM) and Protein Module (PM), respectively. Each module comprises a deep GCN followed by an average pooling layer, which extracts relevant features of their corresponding input graph. Both representations are finally concatenated and combined through a fully connected layer to predict the target–ligand interaction probability. Created with BioRender.com.
Figure 2
Figure 2
Comparison with state-of-the-art methods trained for TLI prediction in the proposed benchmark. (a) Performance distribution curves comparing our model (PLA-Net) with state-of-the-art methods. For each model, we show the number of binary models that achieve a TLI prediction performance greater than or equal to a specific AP value. (b) We compare the performance distribution of the 102 targets in PLA-Net with that of the current state-of-the-art (PharmaNet), showing that PLA-Net consistently improves the AP metric of the majority of the targets, with 59 targets with performance between 90 and 100% versus 29 in PharmaNet. Furthermore, PLA-Net achieves perfect performance for 30 targets with high clinical interest.
Figure 3
Figure 3
Performance distribution of PLA-Net training stages. The performance of individual targets shows a marked tendency towards high and perfect mAP scores (90–100%) as the training curriculum progresses. In particular, LM + PM and augmented LM (LM + A) show a clear improvement in performance distribution with respect to LM, and this is further improved when combining the information extracted by each in PLA-Net. Best viewed in color.
Figure 4
Figure 4
Initialization of protein contribution during LM + PM training. Zeroing the linear classifier’s weights that correspond to the protein contribution at the onset of the training (I0) substantially improves the performance compared to a random initialization of the protein contribution (RI). We measured performance in mAP for 15 representative targets.
Figure 5
Figure 5
Salient feature maps of ligands during PLA-Net training stages. Salient feature maps predicted by the LM trained only on original molecules (LM), the LM trained with adversarial augmentations (LM + A), and the LM and PM jointly trained (LM + PM) for representative ligands of 7 protein targets. The average precision (AP) of each model is presented below their respective feature map and TLI-relevant substructures are shown to the left. All of these substructures have been previously identified through experimental and/or molecular docking analyses between the shown ligand and its respective target protein. The predicted importance of ligand substructures significantly shifts at each training stage despite small changes in AP. The augmented LM achieves predictions that best align with substructures of natural ligands that have been previously reported to participate in TLIs. Created with BioRender.com.
Figure 6
Figure 6
PLA-Net’s pharmacologically-relevant TLI predictions on the Drug Repurposing and CHEMBL databases. From each database, were selected five molecules predicted as active with high probability for nine pharmacologically-relevant targets. The name and prediction probability for each molecule are shown in their upper right corner. The mean Rogot–Goldberg similarity between each molecule and the active molecules of the corresponding training set is shown in red in their lower right corner. The mean Rogot–Goldberg similarity between each molecule and the active molecules of the corresponding training set is shown in red in their lower right corner. Molecules’ activity towards each target was corroborated with previous literature reports . Green label: experimentally-proven active molecule for the respective target. Yellow label: experimentally-proven active molecule for protein closely related to the target of interest. Orange label: not experimentally-proven, but with relevant substructures present in experimentally-proven active molecules for the target. Created with BioRender.com.
Figure 6
Figure 6
PLA-Net’s pharmacologically-relevant TLI predictions on the Drug Repurposing and CHEMBL databases. From each database, were selected five molecules predicted as active with high probability for nine pharmacologically-relevant targets. The name and prediction probability for each molecule are shown in their upper right corner. The mean Rogot–Goldberg similarity between each molecule and the active molecules of the corresponding training set is shown in red in their lower right corner. The mean Rogot–Goldberg similarity between each molecule and the active molecules of the corresponding training set is shown in red in their lower right corner. Molecules’ activity towards each target was corroborated with previous literature reports . Green label: experimentally-proven active molecule for the respective target. Yellow label: experimentally-proven active molecule for protein closely related to the target of interest. Orange label: not experimentally-proven, but with relevant substructures present in experimentally-proven active molecules for the target. Created with BioRender.com.
Figure 7
Figure 7
Adversarial augmentations. (a) Augmented molecules are generated through an edge-deletion process that selects the edge of the molecular graph to delete by following two criteria: (i) the deletion of the selected edge must generate an adversarial molecule whose distance to the Bemis–Murcko scaffold of the original molecule is less than a defined threshold (μ) and (ii) the selected edge must have a negative gradient and the gradient magnitude must be maximal. (b) Comparison of intra-class distance as a function of different similarity metrics. We computed the distances between the Morgan fingerprints of molecules from a specific target class and of their corresponding Bemis–Murcko scaffolds to assess the average intra-class distance as a function of different similarity metrics. We selected the Rogot–Goldberg similarity descriptor due to its high performance for intra-class similarities. Created with BioRender.com.

References

    1. Cui W, et al. Discovering anti-cancer drugs via computational methods. Front. Pharmacol. 2020;11:72–85. doi: 10.3389/fphar.2020.00072. - DOI - PMC - PubMed
    1. Lavecchia A, Cerchia C. In silico methods to address polypharmacology: Current status, applications and future perspectives. Drug Discovery Today. 2016;21:288–298. doi: 10.1016/j.drudis.2015.12.007. - DOI - PubMed
    1. Thomas, D. et al. Clinical development success rates and contributing factors 2011–2020 (2021).
    1. Food, T. & Administration, D. Fda executive summary (2017).
    1. Swinney DC, Anthony J. How were new medicines discovered? Nat. Rev. Drug Discov. 2011;10:507–519. doi: 10.1038/nrd3480. - DOI - PubMed

Publication types