Biomedical Text Link Prediction for Drug Discovery: A Case Study with COVID-19

doi:10.3390/pharmaceutics13060794

. 2021 May 26;13(6):794.

doi: 10.3390/pharmaceutics13060794.

Biomedical Text Link Prediction for Drug Discovery: A Case Study with COVID-19

Kevin McCoy¹, Sateesh Gudapati^{1

2}, Lawrence He¹, Elaina Horlander¹, David Kartchner^{1

3}, Soham Kulkarni^{1

4}, Nidhi Mehra¹, Jayant Prakash^{1

2}, Helena Thenot¹, Sri Vivek Vanga^{1

3}, Abigail Wagner¹, Brandon White¹, Cassie S Mitchell^{1

5}

Affiliations

¹ Laboratory for Pathology Dynamics, Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA.
² Computer Science, Georgia Institute of Technology, Atlanta, GA 30332, USA.
³ Computer Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA.
⁴ Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332, USA.
⁵ Institute for Machine Learning, Georgia Institute of Technology, Atlanta, GA 30332, USA.

PMID: 34073456
PMCID: PMC8230210
DOI: 10.3390/pharmaceutics13060794

Biomedical Text Link Prediction for Drug Discovery: A Case Study with COVID-19

Kevin McCoy et al. Pharmaceutics. 2021.

. 2021 May 26;13(6):794.

doi: 10.3390/pharmaceutics13060794.

Authors

Affiliations

¹ Laboratory for Pathology Dynamics, Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA.
² Computer Science, Georgia Institute of Technology, Atlanta, GA 30332, USA.
³ Computer Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA.
⁴ Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332, USA.
⁵ Institute for Machine Learning, Georgia Institute of Technology, Atlanta, GA 30332, USA.

PMID: 34073456
PMCID: PMC8230210
DOI: 10.3390/pharmaceutics13060794

Abstract

Link prediction in artificial intelligence is used to identify missing links or derive future relationships that can occur in complex networks. A link prediction model was developed using the complex heterogeneous biomedical knowledge graph, SemNet, to predict missing links in biomedical literature for drug discovery. A web application visualized knowledge graph embeddings and link prediction results using TransE, CompleX, and RotatE based methods. The link prediction model achieved up to 0.44 hits@10 on the entity prediction tasks. The recent outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), also known as COVID-19, served as a case study to demonstrate the efficacy of link prediction modeling for drug discovery. The link prediction algorithm guided identification and ranking of repurposed drug candidates for SARS-CoV-2 primarily by text mining biomedical literature from previous coronaviruses, including SARS and middle east respiratory syndrome (MERS). Repurposed drugs included potential primary SARS-CoV-2 treatment, adjunctive therapies, or therapeutics to treat side effects. The link prediction accuracy for nodes ranked highly for SARS coronavirus was 0.875 as calculated by human in the loop validation on existing COVID-19 specific data sets. Drug classes predicted as highly ranked include anti-inflammatory, nucleoside analogs, protease inhibitors, antimalarials, envelope proteins, and glycoproteins. Examples of highly ranked predicted links to SARS-CoV-2: human leukocyte interferon, recombinant interferon-gamma, cyclosporine, antiviral therapy, zidovudine, chloroquine, vaccination, methotrexate, artemisinin, alkaloids, glycyrrhizic acid, quinine, flavonoids, amprenavir, suramin, complement system proteins, fluoroquinolones, bone marrow transplantation, albuterol, ciprofloxacin, quinolone antibacterial agents, and hydroxymethylglutaryl-CoA reductase inhibitors. Approximately 40% of identified drugs were not previously connected to SARS, such as edetic acid or biotin. In summary, link prediction can effectively suggest repurposed drugs for emergent diseases.

Keywords: COVID-19; SARS-CoV-2; coronavirus; literature review; machine learning; natural language processing; repurposed drugs; text mining.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Figures

**Figure 1**
Visualization of subgraph of SemNet Knowledge graph.

**Figure 2**
Link prediction and its sub tasks. For a given triple (h,r,t) (a) represents Relation prediction task and (b,c) represent Entity prediction task. Here, h is head entity, t is tail entity and r is relation.

**Figure 3**
The link prediction pipeline and its 3 main stages: triple extraction, model training, and model deployment.

**Figure 4**
Steps involved in Knowledge graph construction stage.

**Figure 5**
Distribution of most prevalent node types in SemNet ([16]). “Rest of node types” represents the aggregate of remaining node types not individually listed in the figure due to space constraints.

**Figure 6**
Distribution of different relation types in SemNet ([16]). “Rest of relation types” represents the aggregate of remaining relations types not listed in the figure due to space constraints.

**Figure 7**
The Entity embeddings (TransE) of top 25 frequent entity groups.

**Figure 8**
The end-to-end process of ranking link prediction results for a given query.

**Figure 9**
(A) Pie chart illustrating the composition of the COVID-19 case study dataset by link prediction evaluation. (B) Violin plot showing the distribution of standardized HeteSim scores between each link prediction evaluation. Lower HeteSim means a closer relationship between the source node and tail node. (C) Confusion matrix for the link prediction in the COVID case study. “MISSING” and “UNCLEAR” nodes were left out as the true relationship is unknown. Sensitivity = 0.975, specificity = 0.375.

**Figure 10**
(A) Violin plot showing the distribution of standardized HeteSim scores between each pharmacokinetic label. Lower HeteSim score means a closer relationship between the source node and tail node. (B) Violin plot showing the distribution of standardized HeteSim scores between each node type. (C) Violin plot showing the distribution of standardized HeteSim scores between each drug class.

See this image and copyright information in PMC

Cited by

Paving New Roads Using Allium sativum as a Repurposed Drug and Analyzing its Antiviral Action Using Artificial Intelligence Technology.
Atoum MF, Padma KR, Don KR. Atoum MF, et al. Iran J Pharm Res. 2023 Jan 21;21(1):e131577. doi: 10.5812/ijpr-131577. eCollection 2022 Dec. Iran J Pharm Res. 2023. PMID: 36915406 Free PMC article. Review.
Optimizations for Computing Relatedness in Biomedical Heterogeneous Information Networks: SemNet 2.0.
Kirkpatrick A, Onyeze C, Kartchner D, Allegri S, An DN, McCoy K, Davalbhakta E, Mitchell CS. Kirkpatrick A, et al. Big Data Cogn Comput. 2022 Mar;6(1):27. doi: 10.3390/bdcc6010027. Epub 2022 Mar 1. Big Data Cogn Comput. 2022. PMID: 35936510 Free PMC article.
Data-Driven Technology Roadmaps to Identify Potential Technology Opportunities for Hyperuricemia Drugs.
Feng L, Zhao W, Wang J, Lin KY, Guo Y, Zhang L. Feng L, et al. Pharmaceuticals (Basel). 2022 Nov 3;15(11):1357. doi: 10.3390/ph15111357. Pharmaceuticals (Basel). 2022. PMID: 36355529 Free PMC article.
A Systematic Review on the Contribution of Artificial Intelligence in the Development of Medicines for COVID-2019.
Pires C. Pires C. J Pers Med. 2021 Sep 18;11(9):926. doi: 10.3390/jpm11090926. J Pers Med. 2021. PMID: 34575703 Free PMC article. Review.
Literature-Based Discovery to Elucidate the Biological Links between Resistant Hypertension and COVID-19.
Kartchner D, McCoy K, Dubey J, Zhang D, Zheng K, Umrani R, Kim JJ, Mitchell CS. Kartchner D, et al. Biology (Basel). 2023 Sep 21;12(9):1269. doi: 10.3390/biology12091269. Biology (Basel). 2023. PMID: 37759668 Free PMC article.

See all "Cited by" articles

References

1. Chen Q., Allot A., Lu Z. Keep up with the latest coronavirus research. Nature. 2020;579:193. doi: 10.1038/d41586-020-00694-1. - DOI - PubMed
1. Wang L.L., Lo K., Chandrasekhar Y., Reas R., Yang J., Eide D., Funk K., Kinney R., Liu Z., Merrill W., et al. CORD-19: The Covid-19 Open Research Dataset. arXiv. 20202004.10706
1. Wilcke X., Bloem P., De Boer V. The knowledge graph as the default data model for learning on heterogeneous knowledge. Data Sci. 2017;1:39–57. doi: 10.3233/DS-170007. - DOI
1. Bordes A., Usunier N., Garcia-Duran A., Weston J., Yakhnenko O. Translating embeddings for modeling multi-relational data; Proceedings of the Advances in Neural Information Processing Systems; Lake Tahoe, NV, USA. 5–8 December 2013; pp. 2787–2795.
1. Yue X., Wang Z., Huang J., Parthasarathy S., Moosavinasab S., Huang Y., Lin S.M., Zhang W., Zhang P., Sun H. Graph embedding on biomedical networks: Methods, applications and evaluations. Bioinformatics. 2020;36:1241–1251. doi: 10.1093/bioinformatics/btz718. - DOI - PMC - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

[1] Chen Q., Allot A., Lu Z. Keep up with the latest coronavirus research. Nature. 2020;579:193. doi: 10.1038/d41586-020-00694-1. - DOI - PubMed

[2] Chen Q., Allot A., Lu Z. Keep up with the latest coronavirus research. Nature. 2020;579:193. doi: 10.1038/d41586-020-00694-1. - DOI - PubMed

[3] Wang L.L., Lo K., Chandrasekhar Y., Reas R., Yang J., Eide D., Funk K., Kinney R., Liu Z., Merrill W., et al. CORD-19: The Covid-19 Open Research Dataset. arXiv. 20202004.10706

[4] Wang L.L., Lo K., Chandrasekhar Y., Reas R., Yang J., Eide D., Funk K., Kinney R., Liu Z., Merrill W., et al. CORD-19: The Covid-19 Open Research Dataset. arXiv. 20202004.10706

[5] Wilcke X., Bloem P., De Boer V. The knowledge graph as the default data model for learning on heterogeneous knowledge. Data Sci. 2017;1:39–57. doi: 10.3233/DS-170007. - DOI

[6] Wilcke X., Bloem P., De Boer V. The knowledge graph as the default data model for learning on heterogeneous knowledge. Data Sci. 2017;1:39–57. doi: 10.3233/DS-170007. - DOI

[7] Bordes A., Usunier N., Garcia-Duran A., Weston J., Yakhnenko O. Translating embeddings for modeling multi-relational data; Proceedings of the Advances in Neural Information Processing Systems; Lake Tahoe, NV, USA. 5–8 December 2013; pp. 2787–2795.

[8] Bordes A., Usunier N., Garcia-Duran A., Weston J., Yakhnenko O. Translating embeddings for modeling multi-relational data; Proceedings of the Advances in Neural Information Processing Systems; Lake Tahoe, NV, USA. 5–8 December 2013; pp. 2787–2795.

[9] Yue X., Wang Z., Huang J., Parthasarathy S., Moosavinasab S., Huang Y., Lin S.M., Zhang W., Zhang P., Sun H. Graph embedding on biomedical networks: Methods, applications and evaluations. Bioinformatics. 2020;36:1241–1251. doi: 10.1093/bioinformatics/btz718. - DOI - PMC - PubMed

[10] Yue X., Wang Z., Huang J., Parthasarathy S., Moosavinasab S., Huang Y., Lin S.M., Zhang W., Zhang P., Sun H. Graph embedding on biomedical networks: Methods, applications and evaluations. Bioinformatics. 2020;36:1241–1251. doi: 10.1093/bioinformatics/btz718. - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Biomedical Text Link Prediction for Drug Discovery: A Case Study with COVID-19

Affiliations

Biomedical Text Link Prediction for Drug Discovery: A Case Study with COVID-19

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous