. 2013 Apr 15:14:126.

doi: 10.1186/1471-2105-14-126.

Clever generation of rich SPARQL queries from annotated relational schema: application to Semantic Web Service creation for biological databases

Julien Wollbrett¹, Pierre Larmande, Frédéric de Lamotte, Manuel Ruiz

Affiliations

PMID: 23586394
PMCID: PMC3680174
DOI: 10.1186/1471-2105-14-126

Clever generation of rich SPARQL queries from annotated relational schema: application to Semantic Web Service creation for biological databases

Julien Wollbrett et al. BMC Bioinformatics. 2013.

. 2013 Apr 15:14:126.

doi: 10.1186/1471-2105-14-126.

Authors

Julien Wollbrett¹, Pierre Larmande, Frédéric de Lamotte, Manuel Ruiz

Affiliation

¹ CIRAD, UMR AGAP, Montpellier F-34398, France. julien.wollbrett@cirad.fr

PMID: 23586394
PMCID: PMC3680174
DOI: 10.1186/1471-2105-14-126

Abstract

Background: In recent years, a large amount of "-omics" data have been produced. However, these data are stored in many different species-specific databases that are managed by different institutes and laboratories. Biologists often need to find and assemble data from disparate sources to perform certain analyses. Searching for these data and assembling them is a time-consuming task. The Semantic Web helps to facilitate interoperability across databases. A common approach involves the development of wrapper systems that map a relational database schema onto existing domain ontologies. However, few attempts have been made to automate the creation of such wrappers.

Results: We developed a framework, named BioSemantic, for the creation of Semantic Web Services that are applicable to relational biological databases. This framework makes use of both Semantic Web and Web Services technologies and can be divided into two main parts: (i) the generation and semi-automatic annotation of an RDF view; and (ii) the automatic generation of SPARQL queries and their integration into Semantic Web Services backbones. We have used our framework to integrate genomic data from different plant databases.

Conclusions: BioSemantic is a framework that was designed to speed integration of relational databases. We present how it can be used to speed the development of Semantic Web Services for existing relational biological databases. Currently, it creates and annotates RDF views that enable the automatic generation of SPARQL queries. Web Services are also created and deployed automatically, and the semantic annotations of our Web Services are added automatically using SAWSDL attributes. BioSemantic is downloadable at http://southgreen.cirad.fr/?q=content/Biosemantic.

PubMed Disclaimer

Figures

**Figure 1**
**Global architecture of the BioSemantic framework.** The first contribution of our work is the automatic creation of an RDF view containing RDF metadata, which is necessary for the automatic creation of Semantic Web Services. The second contribution is the automatic creation and deployment of Semantic Web Services.

**Figure 2**
**Generation and semi-automatic annotation of the RDF view.** D2RQ creates the D2RQ mapping file, and our BioSemantic API automatically adds new metadata about the database schema. Finally, the mapping file is stored in a repository. Annotation with bio-ontological terms is performed manually by an expert.

**Figure 3**
**Automatic generation of Semantic Web Services.** The Web Services developer selects the bio-ontological terms to be used as input/output. All of the mapping files, which are stored in the mapping file repository, are automatically parsed to find a path linking the input and output ontological terms. If such a path is found, it is used to create a SPARQL query. The query is integrated into a semantic Web Service that is then registered in a Web Service registry, such as BioCatalogue.

**Figure 4**
**Graph-based representation of annotated RDF views.** Each graph is the RDF representation of some part of a relational database. The *d2rq:belongsToClassMap* property links a column to a table. The *d2rq:primaryKey* property defines the primary key of a table. The *d2rq:property* property links a node to a semantic annotation. The columns *marker_name*, from the table *marker*, and *snp_name*, from the table *snp*, are both annotated with the same term: *genomicFeatureDetector* from the GCP domain model ontology [27].

**Figure 5**
**Classification of the database table relationships.** Each light node represents a table of the relational database. Here, we only show the tables, and the columns are not represented. The dark nodes represent the semantic annotations. Each edge represents a property that is shared between 2 nodes. The new properties added by our method, *dr:associatedTo*, *dr:arity* and *rdf:subClassOf*, are indicated in bold.

**Figure 6**
**SAWSDL annotation.** The semantic annotation is represented in bold and tags the input of our Semantic Web Service with the GCP_GenotypeStudy term from the GCP domain model ontology.

**Figure 7**
**BioSemantic form for automatic D2RQ RDF view creation.** For RDF view creation, the user must fill in all fields of the form. The left menu, known as “Actions”, contains all available BioSemantic actions.

**Figure 8**
**BioSemantic form for input/output concept selection.** These concepts will be used to detect a path and to annotate the input/output of the SWS. The user can only select the prefix and concepts used to annotate a previously registered RDF view.

**Figure 9**
**BioSemantic form for RDF view selection and query visualisation/edition.** The red rectangle contains the name of the RDF views annotated with both input/output concepts. The checkbox before each name allows for the selection of a view for SWS creation. The radio button after the name of an RDF view allows for query visualisation/edition. When all desired RDF views are selected, a simple click creates the SWS.

**Figure 10**
**Use case workflow created with Taverna 2.** The workflow contains BioSemantic SWS querying for both the TropGene and Gramene databases for QTL information retrieval and also contains BioMart WS querying EnSembl for gene information retrieval. This workflow also detects QTLs from TropGene and Gramene when both are annotated with the same TO term and have the same mapping position.

**Figure 11**
General information about the GetTropgeneMarkerSPARQL Web Service registered in the BioCatalogue.

**Figure 12**
**BioCatalogue input/output annotations for the GetTropgeneMarkerSPARQL semantic Web Service.** Our bio-ontological terms correspond to the input/output tags in the BioCatalogue.

See this image and copyright information in PMC

References

1. Tsesmetzis N, Couchman M, Higgins J, Smith A, Doonan JH, Seifert GJ, Schmidt EE, Vastrik I, Birney E, Wu G, D’Eustachio P, Stein LD, Morris RJ, Bevan MW, Walsh SV. Arabidopsis reactome: a foundation knowledgebase for plant systems biology. Plant Cell. 2008;20:1426–1436. doi: 10.1105/tpc.108.057976. - DOI - PMC - PubMed
1. Lysenko A, Hindle MM, Taubert J, Saqi M, Rawlings CJ. Data integration for plant genomics—exemplars from the integration of Arabidopsis thaliana databases. Brief Bioinform. 2009;10:676–693. doi: 10.1093/bib/bbp047. - DOI - PubMed
1. Stein LD. Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges. Nat Rev Genet. 2008;9:678–688. - PubMed
1. Goble C, Stevens R. State of the nation in data integration for bioinformatics. J Biomed Inform. 2008;41:687–693. doi: 10.1016/j.jbi.2008.01.008. - DOI - PubMed
1. RDF/XML Syntax Specification (Revised) http://www.w3.org/TR/REC-rdf-syntax/

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Clever generation of rich SPARQL queries from annotated relational schema: application to Semantic Web Service creation for biological databases

Affiliation

Clever generation of rich SPARQL queries from annotated relational schema: application to Semantic Web Service creation for biological databases

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources