Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun 15;14(1):3570.
doi: 10.1038/s41467-023-39301-y.

Biomedical knowledge graph learning for drug repurposing by extending guilt-by-association to multiple layers

Affiliations

Biomedical knowledge graph learning for drug repurposing by extending guilt-by-association to multiple layers

Dongmin Bang et al. Nat Commun. .

Abstract

Computational drug repurposing aims to identify new indications for existing drugs by utilizing high-throughput data, often in the form of biomedical knowledge graphs. However, learning on biomedical knowledge graphs can be challenging due to the dominance of genes and a small number of drug and disease entities, resulting in less effective representations. To overcome this challenge, we propose a "semantic multi-layer guilt-by-association" approach that leverages the principle of guilt-by-association - "similar genes share similar functions", at the drug-gene-disease level. Using this approach, our model DREAMwalk: Drug Repurposing through Exploring Associations using Multi-layer random walk uses our semantic information-guided random walk to generate drug and disease-populated node sequences, allowing for effective mapping of both drugs and diseases in a unified embedding space. Compared to state-of-the-art link prediction models, our approach improves drug-disease association prediction accuracy by up to 16.8%. Moreover, exploration of the embedding space reveals a well-aligned harmony between biological and semantic contexts. We demonstrate the effectiveness of our approach through repurposing case studies for breast carcinoma and Alzheimer's disease, highlighting the potential of multi-layer guilt-by-association perspective for drug repurposing on biomedical knowledge graphs.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. The overview of the DREAMwalk framework.
a The node sequence generation process through teleport-guided random walk. When arriving at a drug/disease node, the random walker selects an action between network traversing and teleport operation based on the teleport factor τ. b The embedding space generation process with heterogeneous Skip-gram model. The heterogeneous Skip-gram performs negative sampling process from the same node types. c The embedding vector space enables computational analysis including clustering of entities and distance-based analysis. d Drug-disease association prediction using XGBoost classifier with subtracted vectors of drug and disease embedding vectors as input. e Repurposing candidate drugs are prioritized using the trained XGBoost classifiers. Given a query disease of interest, all unlabeled drug-disease pair vectors are pass through the trained classifiers to obtain treatment probabilities. These probabilities are then averaged to yield a ranked list of candidate drugs based on their average treatment possibility.
Fig. 2
Fig. 2. The drug-disease association prediction performances of each model on the three biomedKGs.
a DDA prediction performance on MSI network with random split. b DDA prediction performance on MSI network with disease area split. c,d DDA prediction performance with random split on HetioNet and KEGG network, respectively. Throughout (a)–(d), The error bars denote the mean values ± 95% confidence interval, derived through n = 10 independent experiments. Source data are provided as a Source Data file. (AUROC Area Under the Receiver Operating Characteristics curve, AUPR Area Under the Precision-Recall curve).
Fig. 3
Fig. 3. The embedding space of DREAMwalk reflects the pharmacological and biological system-level characteristics of drugs.
a, b Network topology of the three hypertensive drug classes on without-teleport embedding space (left) and DREAMwalk embedding space (right). c, d Network of RAAS and its two targeting drugs. e The normalized euclidean distance between the hypertensive drug pairs on DREAMwalk embedding space (blue) and without-teleport embedding space (orange). f The normalized euclidean distance between the RAAS targeting drugs. On boxplots of (e) and (f), the center line represents the median, while the upper and lower box limits represent the quartiles. The whiskers indicate 1.5 times the interquartile range. All data have been derived through n = 10 independent experiments. g The all-pairwise normalized euclidean distance distribution for all drug treatments for rheumatoid arthritis, asthma, hypertension and allergic rhinitis. On the violin plot, the white dot represents the median, while the thick bar represents the interquartile range and the thin line indicates 1.5 times the interquartile range. Source data are provided as a Source Data file. (RAAS Renin-Angiotensin-Aldosterone System, t-test paired two-sided paired t-test).
Fig. 4
Fig. 4. Ablation study results of DREAMwalk’s teleport operation.
a Concept illustration of Random walk with hierarchy nodes (left) and semantic information-guided Teleport (right). b Drug-disease association (DDA) prediction performances of models random teleport (green), without teleport (orange), with hierarchy nodes (red) and semantic information-guided teleport (blue) on MSI network. c Stacked area plot of number of similarities of drug (blue) and disease (orange) per cut-off. d DDA prediction performances following the change in similarity cut-off. Teleport factor was fixed at 0.3. e DDA prediction performances following the change in teleport factor τ. Similarity cut-off was fixed at 0.4. On box plots of (b, d, e), the center line represents the median, while the upper and lower box limits represent the quartiles. The whiskers indicate 1.5 times the interquartile range. All data have been derived through n = 10 independent experiments. Source data are provided as a Source Data file. (AUROC Area Under the Receiver Operating Characteristics curve, AUPR Area Under the Precision-Recall curve).
Fig. 5
Fig. 5. The window neighbor gene set analysis results.
a Selection of Drug1’s window neighbor genes from node sequences using window of length 2. b GO ontology enrichment results of window neighbors of drug “gabapentin”. c KEGG enrichment results of window neighbors of disease “Parkinson’s disease”. Source data are provided as a Source Data file. Fisher’s Exact test and Benjamini-Hochberg method have been applied for calculating the adjusted p-values. (Adj.: Adjusted).

References

    1. Pushpakom S, et al. Drug repurposing: progress, challenges and recommendations. Nat. Rev. Drug Discov. 2019;18:41–58. doi: 10.1038/nrd.2018.168. - DOI - PubMed
    1. Park K. A review of computational drug repurposing. Transl. Clin. Pharmacol. 2019;27:59–63. doi: 10.12793/tcp.2019.27.2.59. - DOI - PMC - PubMed
    1. Ng YL, Salim CK, Chu JJH. Drug repurposing for covid-19: Approaches, challenges and promising candidates. Pharmacol. Ther. 2021;228:107930. doi: 10.1016/j.pharmthera.2021.107930. - DOI - PMC - PubMed
    1. Smith, D. P. et al. Expert-augmented computational drug repurposing identified baricitinib as a treatment for covid-19. Front. Pharmacol.12, 709856 (2021). - PMC - PubMed
    1. Coronavirus, F. update: FDA authorizes drug combination for treatment of covid-19.US FDA (2020).

Publication types