Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

The SIB Swiss Institute of Bioinformatics Semantic Web of data

SIB Swiss Institute of Bioinformatics RDF Group Members. Nucleic Acids Res. .

Abstract

The SIB Swiss Institute of Bioinformatics (https://www.sib.swiss/) is a federation of bioinformatics research and service groups. The international life science community in academia and industry has been accessing the freely available databases provided by SIB since its inception in 1998. In this paper we present the 11 databases which currently offer semantically enriched data in accordance with the FAIR principles (Findable, Accessible, Interoperable, Reusable), as well as the Swiss Personalized Health Network initiative (SPHN) which also employs this enrichment. The semantic enrichment facilitates the manipulation of large data sets from public databases and private data sets. Examples are provided to illustrate that the data from the SIB databases can not only be queried using precise criteria individually, but also across multiple databases, including a variety of non-SIB databases. Data manipulation, be it exploration, extraction, annotation, combination, and publication, is possible using the SPARQL query language. Providing documentation, tutorials and sample queries makes it easier to navigate this web of semantic data. Through this paper, the reader will discover how the existing SIB knowledge graphs can be leveraged to tackle the complex biological or clinical questions that are being addressed today.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
Top ten compounds found in enzymatic and transport reactions found in Rhea and obtained using a SPARQL query. The ChEBI identifier linking to the entry in ChEBI (column chebi), the compound name (column name) and the number of times the compound is found in Rhea (column countRhea) are returned by the query.
Figure 2.
Figure 2.
A graphical representation of the semantic query addressed over Bgee, OMA and UniProt databases. This query retrieves the proteins associated with ‘lung cancer’ and the orthologs expressed in the rat's lung. Nodes with a question mark represent any value of some concept, for instance, ?gene represents any gene in a given database. Nodes in the form of prefix:suffix represents a term in a vocabulary. For example, orth:OrthologousCluster is defined in the ORTHology ontology https://qfo.github.io/OrthologyOntology. Edges in the form of prefix:suffix are relations between nodes that are also defined in a vocabulary. For instance, up: in up:annotation corresponds to http://purl.uniprot.org/core/. All prefixes are defined in the header of the SPARQL query. For the sake of simplicity, they were omitted in the figure. Finally, edges with ’*’ means this is a composed edge where the same edge type is repeated as many times as available in the data source. Therefore, it represents the traversal of multiple nodes connected with the same edge type.
Figure 3.
Figure 3.
The results of a federated query over Wikidata and UniProt that retrieves the positions of the APP gene in two genome assemblies: GRCh37 and GRCh38. It is known that variants in this gene cause a form of Alzheimer disease.

References

    1. Holmes D.E. 1. The data explosion. Big Data: A Very Short Introduction. 2017; Oxford University Press; 1–13.
    1. The UniProt Consortium Bateman A., Martin M.-J., Orchard S., Magrane M., Ahmad S., Alpi E., Bowler-Barnett E.H., Britto R., Bye-A-Jee H.et al. .. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 2023; 51:D523–D531. - PMC - PubMed
    1. Lombardot T., Morgat A., Axelsen K.B., Aimo L., Hyka-Nouspikel N., Niknejad A., Ignatchenko A., Xenarios I., Coudert E., Redaschi N.et al. .. Updates in Rhea: sPARQLing biochemical reaction data. Nucleic Acids Res. 2019; 47:D596–D600. - PMC - PubMed
    1. Szklarczyk D., Gable A.L., Nastou K.C., Lyon D., Kirsch R., Pyysalo S., Doncheva N.T., Legeay M., Fang T., Bork P.et al. .. The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 2021; 49:D605–D612. - PMC - PubMed
    1. Bastian F.B., Roux J., Niknejad A., Comte A., Fonseca Costa S.S., de Farias T.M., Moretti S., Parmentier G., de Laval V.R., Rosikiewicz M.et al. .. The Bgee suite: integrated curated expression atlas and comparative transcriptomics in animals. Nucleic Acids Res. 2021; 49:D831–D847. - PMC - PubMed