. 2009 Jan 27:10:35.

doi: 10.1186/1471-2105-10-35.

The Genome Reverse Compiler: an explorative annotation tool

Andrew S Warren¹, João Carlos Setubal

Affiliations

PMID: 19173744
PMCID: PMC2640359
DOI: 10.1186/1471-2105-10-35

The Genome Reverse Compiler: an explorative annotation tool

Andrew S Warren et al. BMC Bioinformatics. 2009.

. 2009 Jan 27:10:35.

doi: 10.1186/1471-2105-10-35.

Authors

Andrew S Warren¹, João Carlos Setubal

Affiliation

¹ Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA. anwarren@vt.edu

PMID: 19173744
PMCID: PMC2640359
DOI: 10.1186/1471-2105-10-35

Abstract

Background: As sequencing costs have decreased, whole genome sequencing has become a viable and integral part of biological laboratory research. However, the tools with which genes can be found and functionally characterized have not been readily adapted to be part of the everyday biological sciences toolkit. Most annotation pipelines remain as a service provided by large institutions or come as an unwieldy conglomerate of independent components, each requiring their own setup and maintenance.

Results: To address this issue we have created the Genome Reverse Compiler, an easy-to-use, open-source, automated annotation tool. The GRC is independent of third party software installs and only requires a Linux operating system. This stands in contrast to most annotation packages, which typically require installation of relational databases, sequence similarity software, and a number of other programming language modules. We provide details on the methodology used by GRC and evaluate its performance on several groups of prokaryotes using GRC's built in comparison module.

Conclusion: Traditionally, to perform whole genome annotation a user would either set up a pipeline or take advantage of an online service. With GRC the user need only provide the genome he or she wants to annotate and the function resource files to use. The result is high usability and a very minimal learning curve for the intended audience of life science researchers and bioinformaticians. We believe that the GRC fills a valuable niche in allowing users to perform explorative, whole-genome annotation.

PubMed Disclaimer

Figures

**Figure 1**
**Procedure for gene calls**. Gene calling procedure for GRC. Starting with all ORFs (set M), BLAST information and the generic EDPs are used to make an initial evaluation of coding and non-coding. ORFs determined to be coding go into set C and those ORFs that overlap them go into set L. The EDPs are retrained to be organism specific and are used to remove the low-scoring ORFs from M to create M'. Overlaps in M' are then resolved to create gene calls.

**Figure 2**
**Start site determination**. Support can come from different alignments for various start sites. Alignment 1 supports s1, s2, and s3; Alignment 2: s1 and s2; Alignment 3: s1, s2, s3, and s4. Higher start scores are given to start sites that: occur closer to a supporting alignment, occur at a higher frequency, and are supported by a higher scoring alignment.

**Figure 3**
**Evaluate functional assignment using GO**. Here the term "intracellular part" represents a reference function assigned to the reference gene. The terms "intracellular", "membrane", and "DNA helicase complex" represent possible GRC GO term assignments and their evaluation with respect to the reference term.

**Figure 4**
**GRC pipeline**. Internal pipeline for GRC. Maximal ORFs are found and translated. FSA-BLAST is run using the user specified database and the resulting alignments are used to call and annotate protein coding genes.

**Figure 5**
**Gene finding performance**. Performance of gene finding at increasing minimum gene length in comparison to Glimmer. Precision and Sensitivity tends to increase for each organism as the minimum gene length is increased. Lengths are 100, 150, 200, 250, and 300 bp. Panels a, b, c show the data with identical scaling. Panels d, e, f show details of Glimmer comparison for the E. coli, Pseudo, and Gamma groups. Note: The symbols for panels a, b, c match the symbols and legends of their detailed counterparts.

**Figure 6**
**Start site determination**. Performance of gene finding at increasing minimum gene length with respect to the fraction of TP with correct start sites (Start Precision) and Sensitivity. Panels a, b, c show the data with identical scaling. Panels d, e, f show details of Glimmer comparison for the E. coli, Pseudo, and Gamma groups. Note: The symbols for panels a, b, c match the symbols and legends of their detailed counterparts.

**Figure 7**
**Performance on functional assignment**. Columns show the average fraction of true positive ORFs with confirmed, compatible, and incompatible term assignments. These fractions are not additive since a TP can have a confirmed, compatible, and incompatible term assignment.

**Figure 8**
**Running time**. Total running time of GRC versus the total search space (product of total query and DB length). The main bottleneck for GRC is BLAST.

See this image and copyright information in PMC

Cited by

Draft Genome Sequence of FT9, a Novel Bacillus cereus Strain Isolated from a Brazilian Thermal Spring.
Raiol T, De-Souza MT, Oliveira JV, Silva HS, Orem JC, Cavalcante DA, Almeida NF, Telles GP, Setubal JC, Brigido MM, Torres FA, Stadler PS, Walter ME, Moraes LM. Raiol T, et al. Genome Announc. 2014 Oct 9;2(5):e01027-14. doi: 10.1128/genomeA.01027-14. Genome Announc. 2014. PMID: 25301660 Free PMC article.
CNN-MGP: Convolutional Neural Networks for Metagenomics Gene Prediction.
Al-Ajlan A, El Allali A. Al-Ajlan A, et al. Interdiscip Sci. 2019 Dec;11(4):628-635. doi: 10.1007/s12539-018-0313-4. Epub 2018 Dec 27. Interdiscip Sci. 2019. PMID: 30588558 Free PMC article.
Complete sequencing of Novosphingobium sp. PP1Y reveals a biotechnologically meaningful metabolic pattern.
D'Argenio V, Notomista E, Petrillo M, Cantiello P, Cafaro V, Izzo V, Naso B, Cozzuto L, Durante L, Troncone L, Paolella G, Salvatore F, Di Donato A. D'Argenio V, et al. BMC Genomics. 2014 May 19;15(1):384. doi: 10.1186/1471-2164-15-384. BMC Genomics. 2014. PMID: 24884518 Free PMC article.
Patterns and processes of Mycobacterium bovis evolution revealed by phylogenomic analyses.
Patané JS, Martins J, Beatriz Castelão A, Nishibe C, Montera L, Bigi F, Zumárraga MJ, Cataldi AA, Fonseca Junior A, Roxo E, Luiza A, Osório AR, Jorge Ufms KS, Thacker TC, Almeida NF, Araújo FR, Setubal JC. Patané JS, et al. Genome Biol Evol. 2017 Feb 13;9(3):521-35. doi: 10.1093/gbe/evx022. Online ahead of print. Genome Biol Evol. 2017. PMID: 28201585 Free PMC article.
Complete genome sequence of Mycobacterium massiliense.
Raiol T, Ribeiro GM, Maranhão AQ, Bocca AL, Silva-Pereira I, Junqueira-Kipnis AP, Brigido Mde M, Kipnis A. Raiol T, et al. J Bacteriol. 2012 Oct;194(19):5455. doi: 10.1128/JB.01219-12. J Bacteriol. 2012. PMID: 22965084 Free PMC article.

See all "Cited by" articles

References

1. Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23:673–679. doi: 10.1093/bioinformatics/btm009. - DOI - PMC - PubMed
1. Lukashin AV, Borodovsky M. GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res. 1998;26:1107–1115. doi: 10.1093/nar/26.4.1107. - DOI - PMC - PubMed
1. Nielsen P, Krogh A. Large-scale prokaryotic gene prediction and comparison to genome annotation. Bioinformatics. 2005;21:4322–4329. doi: 10.1093/bioinformatics/bti701. - DOI - PubMed
1. Ouyang Z, Zhu H, Wang J, She ZS. Multivariate entropy distance method for prokaryotic gene identification. J Bioinform Comput Biol. 2004;2:353–373. doi: 10.1142/S0219720004000624. - DOI - PubMed
1. Friedberg I. Automated protein function prediction-the genomic challenge. Brief Bioinform. 2006;7:225–242. doi: 10.1093/bib/bbl004. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The Genome Reverse Compiler: an explorative annotation tool

Affiliation

The Genome Reverse Compiler: an explorative annotation tool

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Research Materials

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Research Materials