Optimizations for Computing Relatedness in Biomedical Heterogeneous Information Networks: SemNet 2.0

doi:10.3390/bdcc6010027

. 2022 Mar;6(1):27.

doi: 10.3390/bdcc6010027. Epub 2022 Mar 1.

Optimizations for Computing Relatedness in Biomedical Heterogeneous Information Networks: SemNet 2.0

Anna Kirkpatrick^{1

2}, Chidozie Onyeze^{1

2}, David Kartchner^{1

3}, Stephen Allegri^{1

4}, Davi Nakajima An^{1

3}, Kevin McCoy^{1

4}, Evie Davalbhakta¹, Cassie S Mitchell^{1

4

5}

Affiliations

¹ Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA.
² School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332, USA.
³ School of Computer Science, Georgia Institute of Technology, Atlanta, GA 30332, USA.
⁴ Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA.
⁵ Machine Learning Center at Georgia Tech, Georgia Institute of Technology, Atlanta, GA 30332, USA.

PMID: 35936510
PMCID: PMC9351549
DOI: 10.3390/bdcc6010027

Optimizations for Computing Relatedness in Biomedical Heterogeneous Information Networks: SemNet 2.0

Anna Kirkpatrick et al. Big Data Cogn Comput. 2022 Mar.

. 2022 Mar;6(1):27.

doi: 10.3390/bdcc6010027. Epub 2022 Mar 1.

Authors

Anna Kirkpatrick^{1

2}, Chidozie Onyeze^{1

2}, David Kartchner^{1

3}, Stephen Allegri^{1

4}, Davi Nakajima An^{1

3}, Kevin McCoy^{1

4}, Evie Davalbhakta¹, Cassie S Mitchell^{1

4

5}

Affiliations

¹ Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA.
² School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332, USA.
³ School of Computer Science, Georgia Institute of Technology, Atlanta, GA 30332, USA.
⁴ Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA.
⁵ Machine Learning Center at Georgia Tech, Georgia Institute of Technology, Atlanta, GA 30332, USA.

PMID: 35936510
PMCID: PMC9351549
DOI: 10.3390/bdcc6010027

Abstract

Literature-based discovery (LBD) summarizes information and generates insight from large text corpuses. The SemNet framework utilizes a large heterogeneous information network or "knowledge graph" of nodes and edges to compute relatedness and rank concepts pertinent to a user-specified target. SemNet provides a way to perform multi-factorial and multi-scalar analysis of complex disease etiology and therapeutic identification using the 33+ million articles in PubMed. The present work improves the efficacy and efficiency of LBD for end users by augmenting SemNet to create SemNet 2.0. A custom Python data structure replaced reliance on Neo4j to improve knowledge graph query times by several orders of magnitude. Additionally, two randomized algorithms were built to optimize the HeteSim metric calculation for computing metapath similarity. The unsupervised learning algorithm for rank aggregation (ULARA), which ranks concepts with respect to the user-specified target, was reconstructed using derived mathematical proofs of correctness and probabilistic performance guarantees for optimization. The upgraded ULARA is generalizable to other rank aggregation problems outside of SemNet. In summary, SemNet 2.0 is a comprehensive open-source software for significantly faster, more effective, and user-friendly means of automated biomedical LBD. An example case is performed to rank relationships between Alzheimer's disease and metabolic co-morbidities.

Keywords: Alzheimer’s disease; HeteSim; SemNet; ULARA; biomedical knowledge graph; machine learning; natural language processing; rank aggregation; relatedness; text mining.

PubMed Disclaimer

Figures

**Figure 1.**
Example graph, metapath, and HeteSim computation.

**Figure 2.**
Overview of SemNet version 1 HeteSim implementation. Speed ratio is computed as (SemNet 1 time)/(SemNet 2 time) and is given for source node insulin and target node Alzheimer’s disease. In SemNet 2, the approximate mean HeteSim algorithm is used with approximation parameters ϵ = 0.1 and r = 0.9.

**Figure 3.**
Distribution of SemNet version 1 HeteSim computation times for all metapaths joining the given source node and Alzheimer’s disease. (a) Insulin; (b) Hypothyroidism; (c) Amyloid.

**Figure 4.**
Distribution of Neo4j query times in SemNet version 1 HeteSim computation for all metapaths joining the given source node and Alzheimer’s disease. (a) Insulin; (b) Hypothyroidism; (c) Amyloid.

**Figure 5.**
Overview of SemNet version 2 approximate mean HeteSim implementation. Speed ratio is (SemNet 1 time)/(SemNet 2 time) and is given for source node insulin and target node Alzheimer’s disease. SemNet version 2 used approximation parameters ϵ = 0.1 and r = 0.9.

**Figure 6.**
An example knowledge graph. Here, we use the convention that nodes are organized by type into vertical columns in the order that they appear in the metapath. We also only show edges that may appear in some metapath instance. This example has m₁ − 1 dead-end nodes on the left and m₂ − 1 dead-end nodes on the right. The HeteSim score of s and t with respect to the metapath is 1 for all values of m₁ and m₂.

**Figure 7.**
An example metapath and knowledge graph, drawn with the same conventions as in Figure 6. Note that, in this example, the removal of dead ends does change the HeteSim score.

**Figure 8.**
Computed randomized pruned HeteSim (RPH) scores for each of the three test graphs. (a) Test graph 1; (b) Test graph 2; (c) Test graph 3.

**Figure 9.**
HeteSim computation times per metapath for all metapaths of length 2 from the given source node to Alzheimer’s disease, using the deterministic HeteSim implementation from SemNet version 2. (a) Insulin; (b) Hypothyroidism; (c) Amyloid.

See this image and copyright information in PMC

Cited by

Literature-Based Discovery to Elucidate the Biological Links between Resistant Hypertension and COVID-19.
Kartchner D, McCoy K, Dubey J, Zhang D, Zheng K, Umrani R, Kim JJ, Mitchell CS. Kartchner D, et al. Biology (Basel). 2023 Sep 21;12(9):1269. doi: 10.3390/biology12091269. Biology (Basel). 2023. PMID: 37759668 Free PMC article.
CompositeView: A Network-Based Visualization Tool.
Allegri SA, McCoy K, Mitchell CS. Allegri SA, et al. Big Data Cogn Comput. 2022 Jun;6(2):66. doi: 10.3390/bdcc6020066. Epub 2022 Jun 14. Big Data Cogn Comput. 2022. PMID: 35847767 Free PMC article.
Artificial Intelligence-Assisted Comparative Analysis of the Overlapping Molecular Pathophysiology of Alzheimer's Disease, Amyotrophic Lateral Sclerosis, and Frontotemporal Dementia.
Wei Z, Iyer MR, Zhao B, Deng J, Mitchell CS. Wei Z, et al. Int J Mol Sci. 2024 Dec 15;25(24):13450. doi: 10.3390/ijms252413450. Int J Mol Sci. 2024. PMID: 39769215 Free PMC article.
Cross-Domain Text Mining to Predict Adverse Events from Tyrosine Kinase Inhibitors for Chronic Myeloid Leukemia.
Mehra N, Varmeziar A, Chen X, Kronick O, Fisher R, Kota V, Mitchell CS. Mehra N, et al. Cancers (Basel). 2022 Sep 26;14(19):4686. doi: 10.3390/cancers14194686. Cancers (Basel). 2022. PMID: 36230609 Free PMC article.
Natural language processing in Alzheimer's disease research: Systematic review of methods, data, and efficacy.
Shakeri A, Farmanbar M. Shakeri A, et al. Alzheimers Dement (Amst). 2025 Feb 11;17(1):e70082. doi: 10.1002/dad2.70082. eCollection 2025 Jan-Mar. Alzheimers Dement (Amst). 2025. PMID: 39935888 Free PMC article. Review.

See all "Cited by" articles

References

1. PubMed Overview. Available online: https://pubmed.ncbi.nlm.nih.gov/about/ (accessed on 10 November 2021).
1. Swanson D Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect. Biol. Med 1986, 30, 7–18. - PubMed
1. Henry S; Wijesinghe DS; Myers A; McInnes BT Using Literature Based Discovery to Gain Insights Into the Metabolomic Processes of Cardiac Arrest. Front. Res. Metr. Anal 2021, 6, 32. - PMC - PubMed
1. McCoy K; Gudapati S; He L; Horlander E; Kartchner D; Kulkarni S; Mehra N; Prakash J; Thenot H; Vanga SV; et al. Biomedical Text Link Prediction for Drug Discovery: A Case Study with COVID-19. Pharmaceutics 2021, 13, 794. - PMC - PubMed
1. Cameron D; Kavuluru R; Rindflesch TC; Sheth AP; Thirunarayan K; Bodenreider O Context-driven automatic subgraph creation for literature-based discovery. J. Biomed. Inform 2015, 54, 141–157. - PMC - PubMed

Grants and funding

R01 AG056169/AG/NIA NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

[1] PubMed Overview. Available online: https://pubmed.ncbi.nlm.nih.gov/about/ (accessed on 10 November 2021).

[2] PubMed Overview. Available online: https://pubmed.ncbi.nlm.nih.gov/about/ (accessed on 10 November 2021).

[3] Swanson D Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect. Biol. Med 1986, 30, 7–18. - PubMed

[4] Swanson D Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect. Biol. Med 1986, 30, 7–18. - PubMed

[5] Henry S; Wijesinghe DS; Myers A; McInnes BT Using Literature Based Discovery to Gain Insights Into the Metabolomic Processes of Cardiac Arrest. Front. Res. Metr. Anal 2021, 6, 32. - PMC - PubMed

[6] Henry S; Wijesinghe DS; Myers A; McInnes BT Using Literature Based Discovery to Gain Insights Into the Metabolomic Processes of Cardiac Arrest. Front. Res. Metr. Anal 2021, 6, 32. - PMC - PubMed

[7] McCoy K; Gudapati S; He L; Horlander E; Kartchner D; Kulkarni S; Mehra N; Prakash J; Thenot H; Vanga SV; et al. Biomedical Text Link Prediction for Drug Discovery: A Case Study with COVID-19. Pharmaceutics 2021, 13, 794. - PMC - PubMed

[8] McCoy K; Gudapati S; He L; Horlander E; Kartchner D; Kulkarni S; Mehra N; Prakash J; Thenot H; Vanga SV; et al. Biomedical Text Link Prediction for Drug Discovery: A Case Study with COVID-19. Pharmaceutics 2021, 13, 794. - PMC - PubMed

[9] Cameron D; Kavuluru R; Rindflesch TC; Sheth AP; Thirunarayan K; Bodenreider O Context-driven automatic subgraph creation for literature-based discovery. J. Biomed. Inform 2015, 54, 141–157. - PMC - PubMed

[10] Cameron D; Kavuluru R; Rindflesch TC; Sheth AP; Thirunarayan K; Bodenreider O Context-driven automatic subgraph creation for literature-based discovery. J. Biomed. Inform 2015, 54, 141–157. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Optimizations for Computing Relatedness in Biomedical Heterogeneous Information Networks: SemNet 2.0

Affiliations

Optimizations for Computing Relatedness in Biomedical Heterogeneous Information Networks: SemNet 2.0

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Related information

Grants and funding

LinkOut - more resources

Full Text Sources