Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Oct 7;11 Suppl 6(Suppl 6):S15.
doi: 10.1186/1471-2105-11-S6-S15.

Next generation models for storage and representation of microbial biological annotation

Affiliations

Next generation models for storage and representation of microbial biological annotation

Daniel J Quest et al. BMC Bioinformatics. .

Abstract

Background: Traditional genome annotation systems were developed in a very different computing era, one where the World Wide Web was just emerging. Consequently, these systems are built as centralized black boxes focused on generating high quality annotation submissions to GenBank/EMBL supported by expert manual curation. The exponential growth of sequence data drives a growing need for increasingly higher quality and automatically generated annotation. Typical annotation pipelines utilize traditional database technologies, clustered computing resources, Perl, C, and UNIX file systems to process raw sequence data, identify genes, and predict and categorize gene function. These technologies tightly couple the annotation software system to hardware and third party software (e.g. relational database systems and schemas). This makes annotation systems hard to reproduce, inflexible to modification over time, difficult to assess, difficult to partition across multiple geographic sites, and difficult to understand for those who are not domain experts. These systems are not readily open to scrutiny and therefore not scientifically tractable. The advent of Semantic Web standards such as Resource Description Framework (RDF) and OWL Web Ontology Language (OWL) enables us to construct systems that address these challenges in a new comprehensive way.

Results: Here, we develop a framework for linking traditional data to OWL-based ontologies in genome annotation. We show how data standards can decouple hardware and third party software tools from annotation pipelines, thereby making annotation pipelines easier to reproduce and assess. An illustrative example shows how TURTLE (Terse RDF Triple Language) can be used as a human readable, but also semantically-aware, equivalent to GenBank/EMBL files.

Conclusions: The power of this approach lies in its ability to assemble annotation data from multiple databases across multiple locations into a representation that is understandable to researchers. In this way, all researchers, experimental and computational, will more easily understand the informatics processes constructing genome annotation and ultimately be able to help improve the systems that produce them.

PubMed Disclaimer

Figures

Figure 1
Figure 1
An overview of the Oak Ridge Genome Annotation and Analysis (ORGAA) system.
Figure 2
Figure 2
SAnoS System Architecture. The open arrows represent the flow of data through the system.
Figure 3
Figure 3
A simplified version of the ORNL annotation ontology edited in Protégé.
Figure 4
Figure 4
An example RDF pipeline for algorithm execution.
Figure 5
Figure 5
A Mechanism for translation between legacy formats and RDF/XML.

Similar articles

Cited by

References

    1. Salzberg S. Genome re-annotation: a wiki solution? Genome Biology. 2007;8(1):102. - PMC - PubMed
    1. Rapid Annotation using Subsystems Technology (RAST) server. http://rast.nmpdr.org
    1. J. Craig Venter Institute (JCVI) Annotation Service. http://www.jcvi.org/cms/research/projects/annotation-service/
    1. Oak Ridge Genome Annotation and Analysis (ORGAA) http://compbio.ornl.gov/tools/pipeline/
    1. White O. A Common Framework for Multiple Sources of Bacterial Annotation. Sequencing and Finishing in the Future: 2009. 2009.

Publication types

LinkOut - more resources