Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Apr 15:14:126.
doi: 10.1186/1471-2105-14-126.

Clever generation of rich SPARQL queries from annotated relational schema: application to Semantic Web Service creation for biological databases

Affiliations

Clever generation of rich SPARQL queries from annotated relational schema: application to Semantic Web Service creation for biological databases

Julien Wollbrett et al. BMC Bioinformatics. .

Abstract

Background: In recent years, a large amount of "-omics" data have been produced. However, these data are stored in many different species-specific databases that are managed by different institutes and laboratories. Biologists often need to find and assemble data from disparate sources to perform certain analyses. Searching for these data and assembling them is a time-consuming task. The Semantic Web helps to facilitate interoperability across databases. A common approach involves the development of wrapper systems that map a relational database schema onto existing domain ontologies. However, few attempts have been made to automate the creation of such wrappers.

Results: We developed a framework, named BioSemantic, for the creation of Semantic Web Services that are applicable to relational biological databases. This framework makes use of both Semantic Web and Web Services technologies and can be divided into two main parts: (i) the generation and semi-automatic annotation of an RDF view; and (ii) the automatic generation of SPARQL queries and their integration into Semantic Web Services backbones. We have used our framework to integrate genomic data from different plant databases.

Conclusions: BioSemantic is a framework that was designed to speed integration of relational databases. We present how it can be used to speed the development of Semantic Web Services for existing relational biological databases. Currently, it creates and annotates RDF views that enable the automatic generation of SPARQL queries. Web Services are also created and deployed automatically, and the semantic annotations of our Web Services are added automatically using SAWSDL attributes. BioSemantic is downloadable at http://southgreen.cirad.fr/?q=content/Biosemantic.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Global architecture of the BioSemantic framework. The first contribution of our work is the automatic creation of an RDF view containing RDF metadata, which is necessary for the automatic creation of Semantic Web Services. The second contribution is the automatic creation and deployment of Semantic Web Services.
Figure 2
Figure 2
Generation and semi-automatic annotation of the RDF view. D2RQ creates the D2RQ mapping file, and our BioSemantic API automatically adds new metadata about the database schema. Finally, the mapping file is stored in a repository. Annotation with bio-ontological terms is performed manually by an expert.
Figure 3
Figure 3
Automatic generation of Semantic Web Services. The Web Services developer selects the bio-ontological terms to be used as input/output. All of the mapping files, which are stored in the mapping file repository, are automatically parsed to find a path linking the input and output ontological terms. If such a path is found, it is used to create a SPARQL query. The query is integrated into a semantic Web Service that is then registered in a Web Service registry, such as BioCatalogue.
Figure 4
Figure 4
Graph-based representation of annotated RDF views. Each graph is the RDF representation of some part of a relational database. The d2rq:belongsToClassMap property links a column to a table. The d2rq:primaryKey property defines the primary key of a table. The d2rq:property property links a node to a semantic annotation. The columns marker_name, from the table marker, and snp_name, from the table snp, are both annotated with the same term: genomicFeatureDetector from the GCP domain model ontology [27].
Figure 5
Figure 5
Classification of the database table relationships. Each light node represents a table of the relational database. Here, we only show the tables, and the columns are not represented. The dark nodes represent the semantic annotations. Each edge represents a property that is shared between 2 nodes. The new properties added by our method, dr:associatedTo, dr:arity and rdf:subClassOf, are indicated in bold.
Figure 6
Figure 6
SAWSDL annotation. The semantic annotation is represented in bold and tags the input of our Semantic Web Service with the GCP_GenotypeStudy term from the GCP domain model ontology.
Figure 7
Figure 7
BioSemantic form for automatic D2RQ RDF view creation. For RDF view creation, the user must fill in all fields of the form. The left menu, known as “Actions”, contains all available BioSemantic actions.
Figure 8
Figure 8
BioSemantic form for input/output concept selection. These concepts will be used to detect a path and to annotate the input/output of the SWS. The user can only select the prefix and concepts used to annotate a previously registered RDF view.
Figure 9
Figure 9
BioSemantic form for RDF view selection and query visualisation/edition. The red rectangle contains the name of the RDF views annotated with both input/output concepts. The checkbox before each name allows for the selection of a view for SWS creation. The radio button after the name of an RDF view allows for query visualisation/edition. When all desired RDF views are selected, a simple click creates the SWS.
Figure 10
Figure 10
Use case workflow created with Taverna 2. The workflow contains BioSemantic SWS querying for both the TropGene and Gramene databases for QTL information retrieval and also contains BioMart WS querying EnSembl for gene information retrieval. This workflow also detects QTLs from TropGene and Gramene when both are annotated with the same TO term and have the same mapping position.
Figure 11
Figure 11
General information about the GetTropgeneMarkerSPARQL Web Service registered in the BioCatalogue.
Figure 12
Figure 12
BioCatalogue input/output annotations for the GetTropgeneMarkerSPARQL semantic Web Service. Our bio-ontological terms correspond to the input/output tags in the BioCatalogue.

References

    1. Tsesmetzis N, Couchman M, Higgins J, Smith A, Doonan JH, Seifert GJ, Schmidt EE, Vastrik I, Birney E, Wu G, D’Eustachio P, Stein LD, Morris RJ, Bevan MW, Walsh SV. Arabidopsis reactome: a foundation knowledgebase for plant systems biology. Plant Cell. 2008;20:1426–1436. doi: 10.1105/tpc.108.057976. - DOI - PMC - PubMed
    1. Lysenko A, Hindle MM, Taubert J, Saqi M, Rawlings CJ. Data integration for plant genomics—exemplars from the integration of Arabidopsis thaliana databases. Brief Bioinform. 2009;10:676–693. doi: 10.1093/bib/bbp047. - DOI - PubMed
    1. Stein LD. Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges. Nat Rev Genet. 2008;9:678–688. - PubMed
    1. Goble C, Stevens R. State of the nation in data integration for bioinformatics. J Biomed Inform. 2008;41:687–693. doi: 10.1016/j.jbi.2008.01.008. - DOI - PubMed
    1. RDF/XML Syntax Specification (Revised) http://www.w3.org/TR/REC-rdf-syntax/

Publication types

LinkOut - more resources