The SIB Swiss Institute of Bioinformatics Semantic Web of data

SIB Swiss Institute of Bioinformatics RDF Group Members

Collaborators

SIB Swiss Institute of Bioinformatics RDF Group Members:
Adrian Altenhoff, Amos Bairoch, Parit Bansal, Delphine Baratin, Frederic Bastian, Jerven Bolleman, Alan Bridge, Frédéric Burdet, Katrin Crameri, Jérôme Dauvillier, Christophe Dessimoz, Sebastien Gehant, Natasha Glover, Kristin Gnodtke, Catherine Hayes, Mark Ibberson, Evgenia Kriventseva, Dmitry Kuznetsov, Lisacek Frédérique, Florence Mehl, Tarcisio Mendes de Farias, Pierre-André Michel, Sébastien Moretti, Anne Morgat, Sabine Österle, Marco Pagni, Nicole Redaschi, Marc Robinson-Rechavi, Kasun Samarasinghe, Ana-Claudia Sima, Damian Szklarczyk, Orlin Topalov, Vasundra Touré, Deepak Unni, Christian von Mering, Julien Wollbrett, Monique Zahn-Zabal, Evgeny Zdobnov

PMID: 37878411
PMCID: PMC10767860
DOI: 10.1093/nar/gkad902

The SIB Swiss Institute of Bioinformatics Semantic Web of data

SIB Swiss Institute of Bioinformatics RDF Group Members. Nucleic Acids Res. 2024.

. 2024 Jan 5;52(D1):D44-D51.

doi: 10.1093/nar/gkad902.

Author

SIB Swiss Institute of Bioinformatics RDF Group Members

Collaborators

SIB Swiss Institute of Bioinformatics RDF Group Members:
Adrian Altenhoff, Amos Bairoch, Parit Bansal, Delphine Baratin, Frederic Bastian, Jerven Bolleman, Alan Bridge, Frédéric Burdet, Katrin Crameri, Jérôme Dauvillier, Christophe Dessimoz, Sebastien Gehant, Natasha Glover, Kristin Gnodtke, Catherine Hayes, Mark Ibberson, Evgenia Kriventseva, Dmitry Kuznetsov, Lisacek Frédérique, Florence Mehl, Tarcisio Mendes de Farias, Pierre-André Michel, Sébastien Moretti, Anne Morgat, Sabine Österle, Marco Pagni, Nicole Redaschi, Marc Robinson-Rechavi, Kasun Samarasinghe, Ana-Claudia Sima, Damian Szklarczyk, Orlin Topalov, Vasundra Touré, Deepak Unni, Christian von Mering, Julien Wollbrett, Monique Zahn-Zabal, Evgeny Zdobnov

PMID: 37878411
PMCID: PMC10767860
DOI: 10.1093/nar/gkad902

Abstract

The SIB Swiss Institute of Bioinformatics (https://www.sib.swiss/) is a federation of bioinformatics research and service groups. The international life science community in academia and industry has been accessing the freely available databases provided by SIB since its inception in 1998. In this paper we present the 11 databases which currently offer semantically enriched data in accordance with the FAIR principles (Findable, Accessible, Interoperable, Reusable), as well as the Swiss Personalized Health Network initiative (SPHN) which also employs this enrichment. The semantic enrichment facilitates the manipulation of large data sets from public databases and private data sets. Examples are provided to illustrate that the data from the SIB databases can not only be queried using precise criteria individually, but also across multiple databases, including a variety of non-SIB databases. Data manipulation, be it exploration, extraction, annotation, combination, and publication, is possible using the SPARQL query language. Providing documentation, tutorials and sample queries makes it easier to navigate this web of semantic data. Through this paper, the reader will discover how the existing SIB knowledge graphs can be leveraged to tackle the complex biological or clinical questions that are being addressed today.

PubMed Disclaimer

Figures

**Figure 1.**
Top ten compounds found in enzymatic and transport reactions found in Rhea and obtained using a SPARQL query. The ChEBI identifier linking to the entry in ChEBI (column chebi), the compound name (column name) and the number of times the compound is found in Rhea (column countRhea) are returned by the query.

**Figure 2.**
A graphical representation of the semantic query addressed over Bgee, OMA and UniProt databases. This query retrieves the proteins associated with ‘*lung cancer*’ and the orthologs expressed in the rat's *lung*. Nodes with a question mark represent any value of some concept, for instance, *?gene* represents any gene in a given database. Nodes in the form of *prefix:suffix* represents a term in a vocabulary. For example, orth:OrthologousCluster is defined in the ORTHology ontology https://qfo.github.io/OrthologyOntology. Edges in the form of *prefix:suffix* are relations between nodes that are also defined in a vocabulary. For instance, *up:* in *up:annotation* corresponds to *http://purl.uniprot.org/core/*. All prefixes are defined in the header of the SPARQL query. For the sake of simplicity, they were omitted in the figure. Finally, edges with ’*’ means this is a composed edge where the same edge type is repeated as many times as available in the data source. Therefore, it represents the traversal of multiple nodes connected with the same edge type.

**Figure 3.**
The results of a federated query over Wikidata and UniProt that retrieves the positions of the APP gene in two genome assemblies: GRCh37 and GRCh38. It is known that variants in this gene cause a form of Alzheimer disease.

See this image and copyright information in PMC

References

1. Holmes D.E. 1. The data explosion. Big Data: A Very Short Introduction. 2017; Oxford University Press; 1–13.
1. The UniProt Consortium Bateman A., Martin M.-J., Orchard S., Magrane M., Ahmad S., Alpi E., Bowler-Barnett E.H., Britto R., Bye-A-Jee H.et al. .. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 2023; 51:D523–D531. - PMC - PubMed
1. Lombardot T., Morgat A., Axelsen K.B., Aimo L., Hyka-Nouspikel N., Niknejad A., Ignatchenko A., Xenarios I., Coudert E., Redaschi N.et al. .. Updates in Rhea: sPARQLing biochemical reaction data. Nucleic Acids Res. 2019; 47:D596–D600. - PMC - PubMed
1. Szklarczyk D., Gable A.L., Nastou K.C., Lyon D., Kirsch R., Pyysalo S., Doncheva N.T., Legeay M., Fang T., Bork P.et al. .. The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 2021; 49:D605–D612. - PMC - PubMed
1. Bastian F.B., Roux J., Niknejad A., Comte A., Fonseca Costa S.S., de Farias T.M., Moretti S., Parmentier G., de Laval V.R., Rosikiewicz M.et al. .. The Bgee suite: integrated curated expression atlas and comparative transcriptomics in animals. Nucleic Acids Res. 2021; 49:D831–D847. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

SIB Swiss Institute of Bioinformatics

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The SIB Swiss Institute of Bioinformatics Semantic Web of data

Collaborators

The SIB Swiss Institute of Bioinformatics Semantic Web of data

Author

Collaborators

Abstract

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources