Next generation models for storage and representation of microbial biological annotation
- PMID: 20946598
- PMCID: PMC3026362
- DOI: 10.1186/1471-2105-11-S6-S15
Next generation models for storage and representation of microbial biological annotation
Abstract
Background: Traditional genome annotation systems were developed in a very different computing era, one where the World Wide Web was just emerging. Consequently, these systems are built as centralized black boxes focused on generating high quality annotation submissions to GenBank/EMBL supported by expert manual curation. The exponential growth of sequence data drives a growing need for increasingly higher quality and automatically generated annotation. Typical annotation pipelines utilize traditional database technologies, clustered computing resources, Perl, C, and UNIX file systems to process raw sequence data, identify genes, and predict and categorize gene function. These technologies tightly couple the annotation software system to hardware and third party software (e.g. relational database systems and schemas). This makes annotation systems hard to reproduce, inflexible to modification over time, difficult to assess, difficult to partition across multiple geographic sites, and difficult to understand for those who are not domain experts. These systems are not readily open to scrutiny and therefore not scientifically tractable. The advent of Semantic Web standards such as Resource Description Framework (RDF) and OWL Web Ontology Language (OWL) enables us to construct systems that address these challenges in a new comprehensive way.
Results: Here, we develop a framework for linking traditional data to OWL-based ontologies in genome annotation. We show how data standards can decouple hardware and third party software tools from annotation pipelines, thereby making annotation pipelines easier to reproduce and assess. An illustrative example shows how TURTLE (Terse RDF Triple Language) can be used as a human readable, but also semantically-aware, equivalent to GenBank/EMBL files.
Conclusions: The power of this approach lies in its ability to assemble annotation data from multiple databases across multiple locations into a representation that is understandable to researchers. In this way, all researchers, experimental and computational, will more easily understand the informatics processes constructing genome annotation and ultimately be able to help improve the systems that produce them.
Figures





Similar articles
-
MicroScope-an integrated resource for community expertise of gene functions and comparative analysis of microbial genomic and metabolic data.Brief Bioinform. 2019 Jul 19;20(4):1071-1084. doi: 10.1093/bib/bbx113. Brief Bioinform. 2019. PMID: 28968784 Free PMC article.
-
Supporting community annotation and user collaboration in the integrated microbial genomes (IMG) system.BMC Genomics. 2016 Apr 26;17:307. doi: 10.1186/s12864-016-2629-y. BMC Genomics. 2016. PMID: 27118214 Free PMC article.
-
CODON-Software to manual curation of prokaryotic genomes.PLoS Comput Biol. 2021 Mar 31;17(3):e1008797. doi: 10.1371/journal.pcbi.1008797. eCollection 2021 Mar. PLoS Comput Biol. 2021. PMID: 33788829 Free PMC article.
-
Assembly, Annotation, and Comparative Genomics in PATRIC, the All Bacterial Bioinformatics Resource Center.Methods Mol Biol. 2018;1704:79-101. doi: 10.1007/978-1-4939-7463-4_4. Methods Mol Biol. 2018. PMID: 29277864 Review.
-
Scalable and versatile container-based pipelines for de novo genome assembly and bacterial annotation.F1000Res. 2023 Sep 25;12:1205. doi: 10.12688/f1000research.139488.1. eCollection 2023. F1000Res. 2023. PMID: 37970066 Free PMC article. Review.
Cited by
-
Proceedings of the 2010 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) conference.BMC Bioinformatics. 2010 Oct 7;11 Suppl 6(Suppl 6):S1. doi: 10.1186/1471-2105-11-S6-S1. BMC Bioinformatics. 2010. PMID: 20946592 Free PMC article. No abstract available.
-
Repositioning microbial biotechnology against COVID-19: the case of microbial production of flavonoids.Microb Biotechnol. 2021 Jan;14(1):94-110. doi: 10.1111/1751-7915.13675. Epub 2020 Oct 13. Microb Biotechnol. 2021. PMID: 33047877 Free PMC article. Review.
-
Systems biology approaches integrated with artificial intelligence for optimized metabolic engineering.Metab Eng Commun. 2020 Dec;11:e00149. doi: 10.1016/j.mec.2020.e00149. Epub 2020 Oct 9. Metab Eng Commun. 2020. PMID: 33072513 Free PMC article. Review.
-
Metabolomics and modelling approaches for systems metabolic engineering.Metab Eng Commun. 2022 Oct 15;15:e00209. doi: 10.1016/j.mec.2022.e00209. eCollection 2022 Dec. Metab Eng Commun. 2022. PMID: 36281261 Free PMC article. Review.
-
WikiGenomes: an open web application for community consumption and curation of gene annotation data in Wikidata.Database (Oxford). 2017 Jan 1;2017(1):bax025. doi: 10.1093/database/bax025. Database (Oxford). 2017. PMID: 28365742 Free PMC article.
References
-
- Rapid Annotation using Subsystems Technology (RAST) server. http://rast.nmpdr.org
-
- J. Craig Venter Institute (JCVI) Annotation Service. http://www.jcvi.org/cms/research/projects/annotation-service/
-
- Oak Ridge Genome Annotation and Analysis (ORGAA) http://compbio.ornl.gov/tools/pipeline/
-
- White O. A Common Framework for Multiple Sources of Bacterial Annotation. Sequencing and Finishing in the Future: 2009. 2009.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Molecular Biology Databases