Gauging triple stores with actual biological data
- PMID: 22373359
- PMCID: PMC3471352
- DOI: 10.1186/1471-2105-13-S1-S3
Gauging triple stores with actual biological data
Abstract
Background: Semantic Web technologies have been developed to overcome the limitations of the current Web and conventional data integration solutions. The Semantic Web is expected to link all the data present on the Internet instead of linking just documents. One of the foundations of the Semantic Web technologies is the knowledge representation language Resource Description Framework (RDF). Knowledge expressed in RDF is typically stored in so-called triple stores (also known as RDF stores), from which it can be retrieved with SPARQL, a language designed for querying RDF-based models. The Semantic Web technologies should allow federated queries over multiple triple stores. In this paper we compare the efficiency of a set of biologically relevant queries as applied to a number of different triple store implementations.
Results: Previously we developed a library of queries to guide the use of our knowledge base Cell Cycle Ontology implemented as a triple store. We have now compared the performance of these queries on five non-commercial triple stores: OpenLink Virtuoso (Open-Source Edition), Jena SDB, Jena TDB, SwiftOWLIM and 4Store. We examined three performance aspects: the data uploading time, the query execution time and the scalability. The queries we had chosen addressed diverse ontological or biological questions, and we found that individual store performance was quite query-specific. We identified three groups of queries displaying similar behaviour across the different stores: 1) relatively short response time queries, 2) moderate response time queries and 3) relatively long response time queries. SwiftOWLIM proved to be a winner in the first group, 4Store in the second one and Virtuoso in the third one.
Conclusions: Our analysis showed that some queries behaved idiosyncratically, in a triple store specific manner, mainly with SwiftOWLIM and 4Store. Virtuoso, as expected, displayed a very balanced performance - its load time and its response time for all the tested queries were better than average among the selected stores; it showed a very good scalability and a reasonable run-to-run reproducibility. Jena SDB and Jena TDB were consistently slower than the other three implementations. Our analysis demonstrated that most queries developed for Virtuoso could be successfully used for other implementations.
Figures


Similar articles
-
BioBenchmark Toyama 2012: an evaluation of the performance of triple stores on biological data.J Biomed Semantics. 2014 Jul 10;5:32. doi: 10.1186/2041-1480-5-32. eCollection 2014. J Biomed Semantics. 2014. PMID: 25089180 Free PMC article.
-
BioFed: federated query processing over life sciences linked open data.J Biomed Semantics. 2017 Mar 15;8(1):13. doi: 10.1186/s13326-017-0118-0. J Biomed Semantics. 2017. PMID: 28298238 Free PMC article.
-
Federated ontology-based queries over cancer data.BMC Bioinformatics. 2012 Jan 25;13 Suppl 1(Suppl 1):S9. doi: 10.1186/1471-2105-13-S1-S9. BMC Bioinformatics. 2012. PMID: 22373043 Free PMC article.
-
A comparison of approaches to accessing existing biological and chemical relational databases via SPARQL.J Cheminform. 2023 Jun 20;15(1):61. doi: 10.1186/s13321-023-00729-5. J Cheminform. 2023. PMID: 37340506 Free PMC article. Review.
-
LinkHub: a Semantic Web system that facilitates cross-database queries and information retrieval in proteomics.BMC Bioinformatics. 2007 May 9;8 Suppl 3(Suppl 3):S5. doi: 10.1186/1471-2105-8-S3-S5. BMC Bioinformatics. 2007. PMID: 17493288 Free PMC article. Review.
Cited by
-
BioBenchmark Toyama 2012: an evaluation of the performance of triple stores on biological data.J Biomed Semantics. 2014 Jul 10;5:32. doi: 10.1186/2041-1480-5-32. eCollection 2014. J Biomed Semantics. 2014. PMID: 25089180 Free PMC article.
-
LungMAP: The Molecular Atlas of Lung Development Program.Am J Physiol Lung Cell Mol Physiol. 2017 Nov 1;313(5):L733-L740. doi: 10.1152/ajplung.00139.2017. Epub 2017 Aug 10. Am J Physiol Lung Cell Mol Physiol. 2017. PMID: 28798251 Free PMC article. Review.
-
Biological databases for behavioral neurobiology.Int Rev Neurobiol. 2012;103:19-38. doi: 10.1016/B978-0-12-388408-4.00002-2. Int Rev Neurobiol. 2012. PMID: 23195119 Free PMC article. Review.
-
TogoTable: cross-database annotation system using the Resource Description Framework (RDF) data model.Nucleic Acids Res. 2014 Jul;42(Web Server issue):W442-8. doi: 10.1093/nar/gku403. Epub 2014 May 14. Nucleic Acids Res. 2014. PMID: 24829452 Free PMC article.
References
-
- Berners-Lee T, Hendler J, Lassila O. The Semantic Web - a new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Sci Am. 2001;284:34. - PubMed
-
- Shadbolt N, Hall W, Berners-Lee T. The Semantic Web revisited. Ieee Intell Syst. 2006;21:96–101.
-
- Jenssen TK, Hovig E. The semantic web and biology. Drug Discov Today. 2002;7:992–992. - PubMed
-
- Antezana E, Egana M, Blonde W, Illarramendi A, Bilbao I, De Baets B, Stevens R, Mironov V, Kuiper M. The cell cycle ontology: an application ontology for the representation and integrated analysis of the cell cycle process. Genome Biol. 2009;10:R58. doi: 10.1186/gb-2009-10-5-r58. - DOI - PMC - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources