CROssBAR: comprehensive resource of biomedical relations with knowledge graph representations

Tunca Doğan^{1

2

3

4}, Heval Atas³, Vishal Joshi⁴, Ahmet Atakan^{5

6}, Ahmet Sureyya Rifaioglu^{5

7}, Esra Nalbat³, Andrew Nightingale⁴, Rabie Saidi⁴, Vladimir Volynkin⁴, Hermann Zellner⁴, Rengul Cetin-Atalay^{3

8}, Maria Martin⁴, Volkan Atalay⁵

Affiliations

¹ Department of Computer Engineering, Hacettepe University, Ankara 06800, Turkey.
² Institute of Informatics, Hacettepe University, Ankara 06800, Turkey.
³ Cancer Systems Biology Laboratory, Graduate School of Informatics, METU, Ankara 06800, Turkey.
⁴ European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire CB10 1SD, UK.
⁵ Department of Computer Engineering, METU, Ankara 06800, Turkey.
⁶ Department of Computer Engineering, EBYU, Erzincan 24002, Turkey.
⁷ Department of Computer Engineering, İskenderun Technical University, Hatay 31200, Turkey.
⁸ Section of Pulmonary and Critical Care Medicine, University of Chicago, Chicago, IL 60637, USA.

PMID: 34181736
PMCID: PMC8450100
DOI: 10.1093/nar/gkab543

CROssBAR: comprehensive resource of biomedical relations with knowledge graph representations

Tunca Doğan et al. Nucleic Acids Res. 2021.

. 2021 Sep 20;49(16):e96.

doi: 10.1093/nar/gkab543.

Authors

Affiliations

¹ Department of Computer Engineering, Hacettepe University, Ankara 06800, Turkey.
² Institute of Informatics, Hacettepe University, Ankara 06800, Turkey.
³ Cancer Systems Biology Laboratory, Graduate School of Informatics, METU, Ankara 06800, Turkey.
⁴ European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire CB10 1SD, UK.
⁵ Department of Computer Engineering, METU, Ankara 06800, Turkey.
⁶ Department of Computer Engineering, EBYU, Erzincan 24002, Turkey.
⁷ Department of Computer Engineering, İskenderun Technical University, Hatay 31200, Turkey.
⁸ Section of Pulmonary and Critical Care Medicine, University of Chicago, Chicago, IL 60637, USA.

PMID: 34181736
PMCID: PMC8450100
DOI: 10.1093/nar/gkab543

Abstract

Systemic analysis of available large-scale biological/biomedical data is critical for studying biological mechanisms, and developing novel and effective treatment approaches against diseases. However, different layers of the available data are produced using different technologies and scattered across individual computational resources without any explicit connections to each other, which hinders extensive and integrative multi-omics-based analysis. We aimed to address this issue by developing a new data integration/representation methodology and its application by constructing a biological data resource. CROssBAR is a comprehensive system that integrates large-scale biological/biomedical data from various resources and stores them in a NoSQL database. CROssBAR is enriched with the deep-learning-based prediction of relationships between numerous data entries, which is followed by the rigorous analysis of the enriched data to obtain biologically meaningful modules. These complex sets of entities and relationships are displayed to users via easy-to-interpret, interactive knowledge graphs within an open-access service. CROssBAR knowledge graphs incorporate relevant genes-proteins, molecular interactions, pathways, phenotypes, diseases, as well as known/predicted drugs and bioactive compounds, and they are constructed on-the-fly based on simple non-programmatic user queries. These intensely processed heterogeneous networks are expected to aid systems-level research, especially to infer biological mechanisms in relation to genes, proteins, their ligands, and diseases.

PubMed Disclaimer

Figures

**Figure 1.**
(A) Overall schematic representation of the CROssBAR system within five main pillars; (i) large-scale biological and biomedical data integration, (ii) deep-learning based prediction of missing relations, (iii) construction of knowledge graph representations with serial association and filtering operations, (iv) experimental validation of the computational results and (v) open-access web-service with an easy-to-use interface and rich visualization options and (B) different types of biological/biomedical components and relationships, and their visual representation in CROssBAR knowledge graphs as nodes and edges.

**Figure 2.**
CROssBAR database; (A) schema-like representation of the independent collections of CROssBAR Mongo NoSQL database, displaying cross-collection relations; (B) CROssBAR database statistics displayed in a circular bar-graph layout, bar lengths are shown in logarithmic scale, each high-level biomedical data component group is displayed with a different colour, grey curves show the matching components in different database collections (i.e. bars connected with grey curves signify the same types of biomedical data -keys-, these mappings are utilized for relating independent database collections to each other), green curves signify the biomedical relationships (e.g. drug–target protein interactions) between different CROssBAR components, statistics are also provided in Supplementary Table S1 (abbreviations; Dis.: disease, Pat.: pathway, DTIs: drug-target interactions).

**Figure 3.**
(A) Simplified workflow of the knowledge graph construction procedure, explained over an example disease term query. With the initiation of graph construction by a disease query, the system: (1) finds the matching disease entry from the relevant collection, (2) gathers genes/proteins that are associated with the query disease (i.e. core genes/proteins), (3) collects additional genes/proteins (i.e. first-neighbours) using PPIs of core genes/proteins, (4) identifies biological processes (pathways), of which these genes/proteins (core + neighbouring) are members, (5) gathers phenotypic terms (HPO) associated with the whole gene/protein set, (6) obtains known drugs and drug candidate compounds targeting these genes/proteins, together with our deep-learning-based interaction predictions and (7) revisits the disease collection to make another query with all collected genes/proteins, to obtain the disease entries that have similar implications as the query disease. The Full-scale workflow of the CROssBAR knowledge graph construction process is provided in Supplementary Figure S1; (B) an example KG obtained from CROssBAR-WS, generated on-the-fly with the user's query of ‘MAPK1’ gene (with the node limit of 10 for each biomedical/biological component, and other default parameters), displayed under the layout selections of multi-layered CROssBAR (left) and circular (right).

**Figure 4.**
Example cases of data exploration using the CROssBAR web-service; (A) the output knowledge graph of trifluoperazine and gastric cancer query; (B) critical signalling pathways and their relation to trifluoperazine and gastric cancer over critical genes/proteins and (C) target interaction similarity between structurally dissimilar molecules: Sorafenib, CHEMBL272938 and CHEMBL3910171 (circular layout display).

**Figure 5.**
The use case of CROssBAR COVID-19 knowledge graphs (https://crossbar.kansil.org/covid_main.php): (A) the large-scale KG (1289 nodes and 6743 edges) and (B) the simplified KG (435 nodes and 1061 edges). Both of these graphs reveal the most overrepresented biological processes during a SARS-CoV-2 infection (i.e. cell cycle, viral mRNA translation, endocytosis, interleukin signalling, etc.), as well as, the potential treatment options with COVID-19 related pre-clinical/clinical results (e.g. Remdesivir, Favipiravir, Dexamethasone, etc.) and our novel *in silico* predictions (for both virus and host proteins) considering long-term drug discovery or short-term drug repositioning applications (e.g. tocilizumab, cyclosporine, becatecarin, tenecteplase, simvastatin, etc.). It also displays rare and complex diseases and phenotypic implications with similar host protein associations (e.g. arthritis, diabetes, respiratory distress, fever, etc.).

**Figure 6.**
*In vitro* experimental results: volcano plots displaying differentially expressed genes in Chloroquine treated liver cells (Huh7 and Mahlavu). We checked the interaction between the significant DEGs (Supplementary Table S2) and genes in the large-scale COVID-19 KG, and applied Fisher's exact test to analyse the significance of the presence of 36 DEGs on the KG (Supplementary Table S4) as opposed to the non-DEGs in the multiplex panel of the gene expression analysis platform (NanoString). The results indicated that DEGs were significantly overrepresented (P-value = 1.5e–05).

**Figure 7.**
CROssBAR knowledge graph diversity analysis use case, intersection graphs between: (A) breast cancer and ovarian cancer, (B) breast cancer and osteosarcoma, (C) ovarian cancer and osteosarcoma, (D) breast cancer, ovarian cancer, and osteosarcoma (triple-wise) queries. Venn diagrams displaying the statistics of shared: (E) nodes and (F) edges, between KGs of different query terms.

See this image and copyright information in PMC

References

1. Fabregat A., Jupe S., Matthews L., Sidiropoulos K., Gillespie M., Garapati P., Haw R., Jassal B., Korninger F., May B.et al. .. The reactome pathway knowledgebase. Nucleic Acids Res. 2018; 46:D649–D655. - PMC - PubMed
1. Kanehisa M., Goto S., Sato Y., Kawashima M., Furumichi M., Tanabe M.. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res. 2014; 42:D199–D205. - PMC - PubMed
1. Kutmon M., Riutta A., Nunes N., Hanspers K., Willighagen E.L., Bohler A., Mélius J., Waagmeester A., Sinha S.R., Miller R.et al. .. WikiPathways: capturing the full diversity of pathway knowledge. Nucleic Acids Res. 2016; 44:D488–D494. - PMC - PubMed
1. Szklarczyk D., Gable A.L., Lyon D., Junge A., Wyder S., Huerta-Cepas J., Simonovic M., Doncheva N.T., Morris J.H., Bork P.et al. .. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 2019; 47:D607–D613. - PMC - PubMed
1. Szklarczyk D., Santos A., Von Mering C., Jensen L.J., Bork P., Kuhn M.. STITCH 5: augmenting protein–chemical interaction networks with tissue and affinity data. Nucleic Acids Res. 2016; 44:D380–D384. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- BacDive

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

CROssBAR: comprehensive resource of biomedical relations with knowledge graph representations

Affiliations

CROssBAR: comprehensive resource of biomedical relations with knowledge graph representations

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Molecular Biology Databases