HAMAP as SPARQL rules-A portable annotation pipeline for genomes and proteomes
- PMID: 32034905
- PMCID: PMC7007698
- DOI: 10.1093/gigascience/giaa003
HAMAP as SPARQL rules-A portable annotation pipeline for genomes and proteomes
Abstract
Background: Genome and proteome annotation pipelines are generally custom built and not easily reusable by other groups. This leads to duplication of effort, increased costs, and suboptimal annotation quality. One way to address these issues is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation.
Results: Here we demonstrate one approach to generate portable genome and proteome annotation pipelines that users can run without recourse to custom software. This proof of concept uses our own rule-based annotation pipeline HAMAP, which provides functional annotation for protein sequences to the same depth and quality as UniProtKB/Swiss-Prot, and the World Wide Web Consortium (W3C) standards Resource Description Framework (RDF) and SPARQL (a recursive acronym for the SPARQL Protocol and RDF Query Language). We translate complex HAMAP rules into the W3C standard SPARQL 1.1 syntax, and then apply them to protein sequences in RDF format using freely available SPARQL engines. This approach supports the generation of annotation that is identical to that generated by our own in-house pipeline, using standard, off-the-shelf solutions, and is applicable to any genome or proteome annotation pipeline.
Conclusions: HAMAP SPARQL rules are freely available for download from the HAMAP FTP site, ftp://ftp.expasy.org/databases/hamap/sparql/, under the CC-BY-ND 4.0 license. The annotations generated by the rules are under the CC-BY 4.0 license. A tutorial and supplementary code to use HAMAP as SPARQL are available on GitHub at https://github.com/sib-swiss/HAMAP-SPARQL, and general documentation about HAMAP can be found on the HAMAP website at https://hamap.expasy.org.
Keywords: SPARQL; function; prediction; protein.
© The Author(s) 2020. Published by Oxford University Press.
Figures







Similar articles
-
HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot.Nucleic Acids Res. 2009 Jan;37(Database issue):D471-8. doi: 10.1093/nar/gkn661. Epub 2008 Oct 11. Nucleic Acids Res. 2009. PMID: 18849571 Free PMC article.
-
HAMAP in 2015: updates to the protein family classification and annotation system.Nucleic Acids Res. 2015 Jan;43(Database issue):D1064-70. doi: 10.1093/nar/gku1002. Epub 2014 Oct 27. Nucleic Acids Res. 2015. PMID: 25348399 Free PMC article.
-
HAMAP in 2013, new developments in the protein family classification and annotation system.Nucleic Acids Res. 2013 Jan;41(Database issue):D584-9. doi: 10.1093/nar/gks1157. Epub 2012 Nov 27. Nucleic Acids Res. 2013. PMID: 23193261 Free PMC article.
-
The annotation of both human and mouse kinomes in UniProtKB/Swiss-Prot: one small step in manual annotation, one giant leap for full comprehension of genomes.Mol Cell Proteomics. 2008 Aug;7(8):1409-19. doi: 10.1074/mcp.R700001-MCP200. Epub 2008 Apr 24. Mol Cell Proteomics. 2008. PMID: 18436524 Free PMC article. Review.
-
Scalable and versatile container-based pipelines for de novo genome assembly and bacterial annotation.F1000Res. 2023 Sep 25;12:1205. doi: 10.12688/f1000research.139488.1. eCollection 2023. F1000Res. 2023. PMID: 37970066 Free PMC article. Review.
Cited by
-
Diverse Taxonomies for Diverse Chemistries: Enhanced Representation of Natural Product Metabolism in UniProtKB.Metabolites. 2021 Jan 12;11(1):48. doi: 10.3390/metabo11010048. Metabolites. 2021. PMID: 33445429 Free PMC article.
-
A large collection of bioinformatics question-query pairs over federated knowledge graphs: methodology and applications.Gigascience. 2025 Jan 6;14:giaf045. doi: 10.1093/gigascience/giaf045. Gigascience. 2025. PMID: 40378136 Free PMC article.
-
The SIB Swiss Institute of Bioinformatics Semantic Web of data.Nucleic Acids Res. 2024 Jan 5;52(D1):D44-D51. doi: 10.1093/nar/gkad902. Nucleic Acids Res. 2024. PMID: 37878411 Free PMC article.
-
Using logical constraints to validate statistical information about disease outbreaks in collaborative knowledge graphs: the case of COVID-19 epidemiology in Wikidata.PeerJ Comput Sci. 2022 Sep 29;8:e1085. doi: 10.7717/peerj-cs.1085. eCollection 2022. PeerJ Comput Sci. 2022. PMID: 36262159 Free PMC article.
-
Bioinformatics analysis of the Microsporidia sp. MB genome: a malaria transmission-blocking symbiont of the Anopheles arabiensis mosquito.BMC Genomics. 2024 Nov 22;25(1):1132. doi: 10.1186/s12864-024-11046-y. BMC Genomics. 2024. PMID: 39578727 Free PMC article.
References
-
- Mukherjee S, Seshadri R, Varghese NJ, et al. .. 1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life. Nat Biotechnol. 2017;35(7):676–83. - PubMed
-
- Paez-Espino D, Eloe-Fadrosh EA, Pavlopoulos GA, et al. .. Uncovering Earth’s virome. Nature. 2016;536(7617):425–30. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources