. 2017 Mar 15;8(1):13.

doi: 10.1186/s13326-017-0118-0.

BioFed: federated query processing over life sciences linked open data

Ali Hasnain¹, Qaiser Mehmood², Syeda Sana E Zainab², Muhammad Saleem³, Claude Warren Jr⁴, Durre Zehra², Stefan Decker², Dietrich Rebholz-Schuhmann²

Affiliations

¹ Insight Centre for Data Analytics, National University of Ireland (NUIG), Galway, Ireland. ali.hasnain@insight-centre.org.
² Insight Centre for Data Analytics, National University of Ireland (NUIG), Galway, Ireland.
³ Universität Leipzig, IFI/AKSW, Leipzig, PO 100920, D-04009, Germany.
⁴ IBM, IDA Business Park, Galway, Ireland.

PMID: 28298238
PMCID: PMC5353896
DOI: 10.1186/s13326-017-0118-0

BioFed: federated query processing over life sciences linked open data

Ali Hasnain et al. J Biomed Semantics. 2017.

. 2017 Mar 15;8(1):13.

doi: 10.1186/s13326-017-0118-0.

Authors

Ali Hasnain¹, Qaiser Mehmood², Syeda Sana E Zainab², Muhammad Saleem³, Claude Warren Jr⁴, Durre Zehra², Stefan Decker², Dietrich Rebholz-Schuhmann²

Affiliations

¹ Insight Centre for Data Analytics, National University of Ireland (NUIG), Galway, Ireland. ali.hasnain@insight-centre.org.
² Insight Centre for Data Analytics, National University of Ireland (NUIG), Galway, Ireland.
³ Universität Leipzig, IFI/AKSW, Leipzig, PO 100920, D-04009, Germany.
⁴ IBM, IDA Business Park, Galway, Ireland.

PMID: 28298238
PMCID: PMC5353896
DOI: 10.1186/s13326-017-0118-0

Abstract

Background: Biomedical data, e.g. from knowledge bases and ontologies, is increasingly made available following open linked data principles, at best as RDF triple data. This is a necessary step towards unified access to biological data sets, but this still requires solutions to query multiple endpoints for their heterogeneous data to eventually retrieve all the meaningful information. Suggested solutions are based on query federation approaches, which require the submission of SPARQL queries to endpoints. Due to the size and complexity of available data, these solutions have to be optimised for efficient retrieval times and for users in life sciences research. Last but not least, over time, the reliability of data resources in terms of access and quality have to be monitored. Our solution (BioFed) federates data over 130 SPARQL endpoints in life sciences and tailors query submission according to the provenance information. BioFed has been evaluated against the state of the art solution FedX and forms an important benchmark for the life science domain.

Methods: The efficient cataloguing approach of the federated query processing system 'BioFed', the triple pattern wise source selection and the semantic source normalisation forms the core to our solution. It gathers and integrates data from newly identified public endpoints for federated access. Basic provenance information is linked to the retrieved data. Last but not least, BioFed makes use of the latest SPARQL standard (i.e., 1.1) to leverage the full benefits for query federation. The evaluation is based on 10 simple and 10 complex queries, which address data in 10 major and very popular data sources (e.g., Dugbank, Sider).

Results: BioFed is a solution for a single-point-of-access for a large number of SPARQL endpoints providing life science data. It facilitates efficient query generation for data access and provides basic provenance information in combination with the retrieved data. BioFed fully supports SPARQL 1.1 and gives access to the endpoint's availability based on the EndpointData graph. Our evaluation of BioFed against FedX is based on 20 heterogeneous federated SPARQL queries and shows competitive execution performance in comparison to FedX, which can be attributed to the provision of provenance information for the source selection.

Conclusion: Developing and testing federated query engines for life sciences data is still a challenging task. According to our findings, it is advantageous to optimise the source selection. The cataloguing of SPARQL endpoints, including type and property indexing, leads to efficient querying of data resources over the Web of Data. This could even be further improved through the use of ontologies, e.g., for abstract normalisation of query terms.

Keywords: Life sciences dataset; Linked open data; SPARQL query federation.

PubMed Disclaimer

Figures

**Fig. 1**
BioFed architecture. ARDI comes from previous work by Hasnain et al. [4, 16]

**Fig. 2**
Datasets connectivity. Connectivity overview of some Life science data sets through classes/properties, used in experimental setup

**Fig. 3**
Query execution time for simple category queries. Comparison of simple queries execution time run on FedX and BioFed

**Fig. 4**
Query execution time for complex category queries. Comparison of complex queries execution time run on FedX and BioFed

See this image and copyright information in PMC

Cited by

PIBAS FedSPARQL: a web-based platform for integration and exploration of bioinformatics datasets.
Djokic-Petrovic M, Cvjetkovic V, Yang J, Zivanovic M, Wild DJ. Djokic-Petrovic M, et al. J Biomed Semantics. 2017 Sep 20;8(1):42. doi: 10.1186/s13326-017-0151-z. J Biomed Semantics. 2017. PMID: 28931422 Free PMC article.
Bio-SODA UX: enabling natural language question answering over knowledge graphs with user disambiguation.
Sima AC, Mendes de Farias T, Anisimova M, Dessimoz C, Robinson-Rechavi M, Zbinden E, Stockinger K. Sima AC, et al. Distrib Parallel Databases. 2022;40(2-3):409-440. doi: 10.1007/s10619-022-07414-w. Epub 2022 Jul 16. Distrib Parallel Databases. 2022. PMID: 36097541 Free PMC article.
Authors' attitude toward adopting a new workflow to improve the computability of phenotype publications.
Cui H, Ford B, Starr J, Reznicek A, Zhang L, Macklin JA. Cui H, et al. Database (Oxford). 2022 Feb 2;2022:baac001. doi: 10.1093/database/baac001. Database (Oxford). 2022. PMID: 35106535 Free PMC article.
The Gene Ontology resource: enriching a GOld mine.
Gene Ontology Consortium. Gene Ontology Consortium. Nucleic Acids Res. 2021 Jan 8;49(D1):D325-D334. doi: 10.1093/nar/gkaa1113. Nucleic Acids Res. 2021. PMID: 33290552 Free PMC article.
Enabling semantic queries across federated bioinformatics databases.
Sima AC, Mendes de Farias T, Zbinden E, Anisimova M, Gil M, Stockinger H, Stockinger K, Robinson-Rechavi M, Dessimoz C. Sima AC, et al. Database (Oxford). 2019 Jan 1;2019:baz106. doi: 10.1093/database/baz106. Database (Oxford). 2019. PMID: 31697362 Free PMC article.

See all "Cited by" articles

References

1. Saleem M, Khan Y, Hasnain A, Ermilov I, Ngomo A-CN. A fine-grained evaluation of sparql endpoint federation systems. Semantic Web Journal. 2014. http://content.iospress.com/articles/semantic-web/sw186. Accessed 5 Feb 2017.
1. Saleem M, Shanmukha S, Ngonga AC, Almeida JS, Decker S, Deus HF. Linked cancer genome atlas database. In: I-Semantics 2013: 2013. p. 129–34. http://dl.acm.org/citation.cfm?id=2506200. Accessed 5 Feb 2017.
1. Saleem M, Padmanabhuni SS, Ngomo A-CN, Iqbal A, Almeida JS, Decker S, Deus HF. TopFed: TCGA tailored federated query processing and linking to LOD. J Biomed Semantics. 2014:1–33. https://jbiomedsem.biomedcentral.com/articles/10.1186/2041-1480-5-47. Accessed 5 Feb 2017. - DOI - PMC - PubMed
1. Hasnain A, Zainab SSE, Kamdar MR, Mehmood Q, Warren Jr C, et al. A roadmap for navigating the life scinces linked open data cloud. In: International Semantic Technology (JIST2014) Conference: 2014. http://link.springer.com/chapter/10.1007/978-3-319-15615-6_8. Accessed 5 Feb 2017. - DOI
1. Hasnain A, Mehmood Q, Sana e Zainab S, Hogan A. SPORTAL: Profiling the Content of Public SPARQL Endpoints. International Journal on Semantic Web and Information Systems (IJSWIS). 2016; 12(3):134–163. doi:10.4018/IJSWIS.2016070105. - DOI

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

BioFed: federated query processing over life sciences linked open data

Affiliations

BioFed: federated query processing over life sciences linked open data

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources