Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jul 20;1(1):63-7.
doi: 10.4056/sigs.632.

The DOE-JGI Standard Operating Procedure for the Annotations of Microbial Genomes

The DOE-JGI Standard Operating Procedure for the Annotations of Microbial Genomes

Konstantinos Mavromatis et al. Stand Genomic Sci. .

Abstract

The DOE-JGI Microbial Annotation Pipeline (DOE-JGI MAP) supports gene prediction and/or functional annotation of microbial genomes towards comparative analysis with the Integrated Microbial Genome (IMG) system. DOE-JGI MAP annotation is applied on nucleotide sequence datasets included in the IMG-ER (Expert Review) version of IMG via the IMG ER submission site. Users can submit the sequence datasets consisting of one or more contigs in a multi-fasta file. DOE-JGI MAP annotation includes prediction of protein coding and RNA genes, as well as repeats and assignment of product names to these genes.

Keywords: GeneMark; IMG-ER; Joint Genome Institute; Metagene; RNAmmer; Rfam; functional annotation; gene prediction; tRNA-Scan.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Data flow for gene prediction in the DOE-JGI MAP. Nucleotide sequences are annotated using tools to predict repeats (CRISPR) and RNA genes. Subsequently protein-coding genes are predicted using either GeneMark or Metagene. The consolidated results are then used to create a Genbank file, which is uploaded into the IMG/ER database.
Figure 2
Figure 2
The gene product name assignment procedure used in the DOE-JGI MAP. Genes are first compared to protein families (COGs, Pfam, TIGRfam) and protein databases (KEGG, IMG). A product name is assigned through a series of checks to identify significant hits to IMG terms and the protein families databases. At the end of the process translation tables are used to produce a Genbank compliant product name from the respective source.

References

    1. Markowitz VM, Szeto E, Palaniappan K, Grechkin Y, Chu K, Chen IM, Dubchak I, Anderson I, Lykidis A, Mavromatis K, et al. The integrated microbial genomes (IMG) system in 2007: data content and analysis tool extensions. Nucleic Acids Res 2008; 36:D528-D533 - PMC - PubMed
    1. Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 1997; 25:955-964 - PMC - PubMed
    1. Lagesen K, Hallin P, Rodland EA, Staerfeldt HH, Rognes T, Ussery DW. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 2007; 35:3100-3108 10.1093/nar/gkm160 - DOI - PMC - PubMed
    1. Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res 2004; 33:D121-D124 10.1093/nar/gki081 - DOI - PMC - PubMed
    1. Nawrocki EP, Kolbe DL, Eddy SR. Infernal 1.0: inference of RNA alignments. Bioinformatics 2009; 25:1335-1337 - PMC - PubMed

LinkOut - more resources