Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Apr 15;34(8):1401-1403.
doi: 10.1093/bioinformatics/btx767.

SAPP: functional genome annotation and analysis through a semantic framework using FAIR principles

Affiliations

SAPP: functional genome annotation and analysis through a semantic framework using FAIR principles

Jasper J Koehorst et al. Bioinformatics. .

Abstract

Summary: To unlock the full potential of genome data and to enhance data interoperability and reusability of genome annotations we have developed SAPP, a Semantic Annotation Platform with Provenance. SAPP is designed as an infrastructure supporting FAIR de novo computational genomics but can also be used to process and analyze existing genome annotations. SAPP automatically predicts, tracks and stores structural and functional annotations and associated dataset- and element-wise provenance in a Linked Data format, thereby enabling information mining and retrieval with Semantic Web technologies. This greatly reduces the administrative burden of handling multiple analysis tools and versions thereof and facilitates multi-level large scale comparative analysis.

Availability and implementation: SAPP is written in JAVA and freely available at https://gitlab.com/sapp and runs on Unix-like operating systems. The documentation, examples and a tutorial are available at https://sapp.gitlab.io.

Contact: jasperkoehorst@gmail.com or peter.schaap@wur.nl.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
(A) The conversion module imports genome sequences in common formats. Annotation modules perform common tasks such as gene, tRNA, protein and protein domain annotation. Results are stored as Linked Data and consistency is ensured by the GBOL stack. (B) SPARQL query to retrieve the E-value score of the instances of the protein domain PF00465 across multiple bacterial genomes. (C) Distribution of E-values for protein domain PF00465 across multiple bacterial genomes: note the multimodality of the distribution. (D) Principal component analysis of functional similarities of 100 bacterial genomes from the Streptococcus (blue) and the Staphylococcus (orange) genera. PC1 and PC2 account for 51.4 and 10.1% of the variance in the dataset respectively

References

    1. Bolleman J. et al. (2016) FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation. J. Biomed. Seman., 7, 1–19. - PMC - PubMed
    1. Brickley D., Miller L. (2007) Foaf vocabulary specification 0.91.
    1. Eilbeck K. et al. (2005) The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol., 6, R44. - PMC - PubMed
    1. Fernández J.D. et al. (2013) Binary RDF representation for publication and exchange (HDT). Web Semant. Sci. Serv. Agents World Wide Web, 19, 22–41.
    1. Giasson F., D’arcus B. (2008) Bibliographic ontology. Technical report, Technical report.

Publication types