Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 26;13(6):794.
doi: 10.3390/pharmaceutics13060794.

Biomedical Text Link Prediction for Drug Discovery: A Case Study with COVID-19

Affiliations

Biomedical Text Link Prediction for Drug Discovery: A Case Study with COVID-19

Kevin McCoy et al. Pharmaceutics. .

Abstract

Link prediction in artificial intelligence is used to identify missing links or derive future relationships that can occur in complex networks. A link prediction model was developed using the complex heterogeneous biomedical knowledge graph, SemNet, to predict missing links in biomedical literature for drug discovery. A web application visualized knowledge graph embeddings and link prediction results using TransE, CompleX, and RotatE based methods. The link prediction model achieved up to 0.44 hits@10 on the entity prediction tasks. The recent outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), also known as COVID-19, served as a case study to demonstrate the efficacy of link prediction modeling for drug discovery. The link prediction algorithm guided identification and ranking of repurposed drug candidates for SARS-CoV-2 primarily by text mining biomedical literature from previous coronaviruses, including SARS and middle east respiratory syndrome (MERS). Repurposed drugs included potential primary SARS-CoV-2 treatment, adjunctive therapies, or therapeutics to treat side effects. The link prediction accuracy for nodes ranked highly for SARS coronavirus was 0.875 as calculated by human in the loop validation on existing COVID-19 specific data sets. Drug classes predicted as highly ranked include anti-inflammatory, nucleoside analogs, protease inhibitors, antimalarials, envelope proteins, and glycoproteins. Examples of highly ranked predicted links to SARS-CoV-2: human leukocyte interferon, recombinant interferon-gamma, cyclosporine, antiviral therapy, zidovudine, chloroquine, vaccination, methotrexate, artemisinin, alkaloids, glycyrrhizic acid, quinine, flavonoids, amprenavir, suramin, complement system proteins, fluoroquinolones, bone marrow transplantation, albuterol, ciprofloxacin, quinolone antibacterial agents, and hydroxymethylglutaryl-CoA reductase inhibitors. Approximately 40% of identified drugs were not previously connected to SARS, such as edetic acid or biotin. In summary, link prediction can effectively suggest repurposed drugs for emergent diseases.

Keywords: COVID-19; SARS-CoV-2; coronavirus; literature review; machine learning; natural language processing; repurposed drugs; text mining.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Figures

Figure 1
Figure 1
Visualization of subgraph of SemNet Knowledge graph.
Figure 2
Figure 2
Link prediction and its sub tasks. For a given triple (h,r,t) (a) represents Relation prediction task and (b,c) represent Entity prediction task. Here, h is head entity, t is tail entity and r is relation.
Figure 3
Figure 3
The link prediction pipeline and its 3 main stages: triple extraction, model training, and model deployment.
Figure 4
Figure 4
Steps involved in Knowledge graph construction stage.
Figure 5
Figure 5
Distribution of most prevalent node types in SemNet ([16]). “Rest of node types” represents the aggregate of remaining node types not individually listed in the figure due to space constraints.
Figure 6
Figure 6
Distribution of different relation types in SemNet ([16]). “Rest of relation types” represents the aggregate of remaining relations types not listed in the figure due to space constraints.
Figure 7
Figure 7
The Entity embeddings (TransE) of top 25 frequent entity groups.
Figure 8
Figure 8
The end-to-end process of ranking link prediction results for a given query.
Figure 9
Figure 9
(A) Pie chart illustrating the composition of the COVID-19 case study dataset by link prediction evaluation. (B) Violin plot showing the distribution of standardized HeteSim scores between each link prediction evaluation. Lower HeteSim means a closer relationship between the source node and tail node. (C) Confusion matrix for the link prediction in the COVID case study. “MISSING” and “UNCLEAR” nodes were left out as the true relationship is unknown. Sensitivity = 0.975, specificity = 0.375.
Figure 10
Figure 10
(A) Violin plot showing the distribution of standardized HeteSim scores between each pharmacokinetic label. Lower HeteSim score means a closer relationship between the source node and tail node. (B) Violin plot showing the distribution of standardized HeteSim scores between each node type. (C) Violin plot showing the distribution of standardized HeteSim scores between each drug class.

Similar articles

Cited by

References

    1. Chen Q., Allot A., Lu Z. Keep up with the latest coronavirus research. Nature. 2020;579:193. doi: 10.1038/d41586-020-00694-1. - DOI - PubMed
    1. Wang L.L., Lo K., Chandrasekhar Y., Reas R., Yang J., Eide D., Funk K., Kinney R., Liu Z., Merrill W., et al. CORD-19: The Covid-19 Open Research Dataset. arXiv. 20202004.10706
    1. Wilcke X., Bloem P., De Boer V. The knowledge graph as the default data model for learning on heterogeneous knowledge. Data Sci. 2017;1:39–57. doi: 10.3233/DS-170007. - DOI
    1. Bordes A., Usunier N., Garcia-Duran A., Weston J., Yakhnenko O. Translating embeddings for modeling multi-relational data; Proceedings of the Advances in Neural Information Processing Systems; Lake Tahoe, NV, USA. 5–8 December 2013; pp. 2787–2795.
    1. Yue X., Wang Z., Huang J., Parthasarathy S., Moosavinasab S., Huang Y., Lin S.M., Zhang W., Zhang P., Sun H. Graph embedding on biomedical networks: Methods, applications and evaluations. Bioinformatics. 2020;36:1241–1251. doi: 10.1093/bioinformatics/btz718. - DOI - PMC - PubMed

LinkOut - more resources