Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb 1;9(2):giaa003.
doi: 10.1093/gigascience/giaa003.

HAMAP as SPARQL rules-A portable annotation pipeline for genomes and proteomes

Affiliations

HAMAP as SPARQL rules-A portable annotation pipeline for genomes and proteomes

Jerven Bolleman et al. Gigascience. .

Abstract

Background: Genome and proteome annotation pipelines are generally custom built and not easily reusable by other groups. This leads to duplication of effort, increased costs, and suboptimal annotation quality. One way to address these issues is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation.

Results: Here we demonstrate one approach to generate portable genome and proteome annotation pipelines that users can run without recourse to custom software. This proof of concept uses our own rule-based annotation pipeline HAMAP, which provides functional annotation for protein sequences to the same depth and quality as UniProtKB/Swiss-Prot, and the World Wide Web Consortium (W3C) standards Resource Description Framework (RDF) and SPARQL (a recursive acronym for the SPARQL Protocol and RDF Query Language). We translate complex HAMAP rules into the W3C standard SPARQL 1.1 syntax, and then apply them to protein sequences in RDF format using freely available SPARQL engines. This approach supports the generation of annotation that is identical to that generated by our own in-house pipeline, using standard, off-the-shelf solutions, and is applicable to any genome or proteome annotation pipeline.

Conclusions: HAMAP SPARQL rules are freely available for download from the HAMAP FTP site, ftp://ftp.expasy.org/databases/hamap/sparql/, under the CC-BY-ND 4.0 license. The annotations generated by the rules are under the CC-BY 4.0 license. A tutorial and supplementary code to use HAMAP as SPARQL are available on GitHub at https://github.com/sib-swiss/HAMAP-SPARQL, and general documentation about HAMAP can be found on the HAMAP website at https://hamap.expasy.org.

Keywords: SPARQL; function; prediction; protein.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
RDF namespace declarations for prefixes used in other figures.
Figure 2:
Figure 2:
Part of the HAMAP rule for signature MF_00005 as a SPARQL CONSTRUCT query.
Figure 3:
Figure 3:
SPARQL CONSTRUCT block of Fig. 2 extended with metadata expressed as RDF reification quads.
Figure 4:
Figure 4:
Example protein record in an RDF format suitable for HAMAP SPARQL rules.
Figure 5:
Figure 5:
Example protein sequence/signature match in RDF syntax.
Figure 6:
Figure 6:
Example query for comparison of annotations generated by the different systems, taking into account whether a system inserts the full GO or UniProt keyword hierarchy or only leaf nodes.
Figure 7:
Figure 7:
(A) Hypothetical triples to describe a sequence entry from RNAcentral.org that is a member of the Rfam RNA family RF00003 (U1 spliceosomal RNA family). (B) Hypothetical rule associating RF00003 to the GO term GO:0005685 (definition: “A ribonucleoprotein complex that contains small nuclear RNA U1”).

Similar articles

Cited by

References

    1. Lewin HA, Robinson GE, Kress WJ, et al. .. Earth BioGenome Project: sequencing life for the future of life. Proc Natl Acad Sci U S A. 2018;115(17):4325–33. - PMC - PubMed
    1. Mukherjee S, Seshadri R, Varghese NJ, et al. .. 1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life. Nat Biotechnol. 2017;35(7):676–83. - PubMed
    1. Paez-Espino D, Eloe-Fadrosh EA, Pavlopoulos GA, et al. .. Uncovering Earth’s virome. Nature. 2016;536(7617):425–30. - PubMed
    1. Thompson LR, Sanders JG, McDonald D, et al. .. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature. 2017;551(7681):457–63. - PMC - PubMed
    1. Tighe S, Afshinnekoo E, Rock TM, et al. .. Genomic methods and microbiological technologies for profiling novel and extreme environments for the Extreme Microbiome Project (XMP). J Biomol Tech. 2017;28(1):31–9. - PMC - PubMed

Publication types

MeSH terms