Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Jun 13:7:150.
doi: 10.1186/1471-2164-7-150.

Genome Annotation Transfer Utility (GATU): rapid annotation of viral genomes using a closely related reference genome

Affiliations

Genome Annotation Transfer Utility (GATU): rapid annotation of viral genomes using a closely related reference genome

Vasily Tcherepanov et al. BMC Genomics. .

Abstract

Background: Since DNA sequencing has become easier and cheaper, an increasing number of closely related viral genomes have been sequenced. However, many of these have been deposited in GenBank without annotations, severely limiting their value to researchers. While maintaining comprehensive genomic databases for a set of virus families at the Viral Bioinformatics Resource Center http://www.biovirus.org and Viral Bioinformatics - Canada http://www.virology.ca, we found that researchers were unnecessarily spending time annotating viral genomes that were close relatives of already annotated viruses. We have therefore designed and implemented a novel tool, Genome Annotation Transfer Utility (GATU), to transfer annotations from a previously annotated reference genome to a new target genome, thereby greatly reducing this laborious task.

Results: GATU transfers annotations from a reference genome to a closely related target genome, while still giving the user final control over which annotations should be included. GATU also detects open reading frames present in the target but not the reference genome and provides the user with a variety of bioinformatics tools to quickly determine if these ORFs should also be included in the annotation. After this process is complete, GATU saves the newly annotated genome as a GenBank, EMBL or XML-format file. The software is coded in Java and runs on a variety of computer platforms. Its user-friendly Graphical User Interface is specifically designed for users trained in the biological sciences.

Conclusion: GATU greatly simplifies the initial stages of genome annotation by using a closely related genome as a reference. It is not intended to be a gene prediction tool or a "complete" annotation system, but we have found that it significantly reduces the time required for annotation of genes and mature peptides as well as helping to standardize gene names between related organisms by transferring reference genome annotations to the target genome. The program is freely available under the General Public License and can be accessed along with documentation and tutorial from http://www.virology.ca/gatu.

PubMed Disclaimer

Figures

Figure 1
Figure 1
GATU process flow chart.
Figure 2
Figure 2
GATU GUI screen shot after loading genomes and clicking Annotation button; the annotations that have been read from the reference genome GenBank file are displayed.
Figure 3
Figure 3
List of ORFs automatically annotated by GATU. One ORF that was not detected in the target genome is highlighted in blue in the top panel of the main GATU window; the Accept button is not selected. There is no corresponding ORF in the bottom half of the panel representing the target genome.
Figure 4
Figure 4
Results of a NEEDLE alignment run with an Unassigned-ORF; display is presented in the main GATU window.
Figure 5
Figure 5
Results of TBLASTN search with an Unassigned-ORF; display is presented in the main GATU window.
Figure 6
Figure 6
Genome map panel of GATU interface. Display of an Unassigned-ORF after temporarily selecting Accept box and clicking the Jump button; ORF is shown with green highlighting.

References

    1. Sequin http://www.ncbi.nlm.nih.gov/Sequin/index.html
    1. Viral Bioinformatics Resource Center http://www.virology.ca
    1. Brodie R, Smith AJ, Roper RL, Tcherepanov V, Upton C. Base-By-Base: Single nucleotide-level analysis of whole viral genome alignments. BMC Bioinformatics. 2004;5:96. doi: 10.1186/1471-2105-5-96. - DOI - PMC - PubMed
    1. Upton C, Hogg D, Perrin D, Boone M, Harris NL. Viral genome organizer: a system for analyzing complete viral genomes. Virus Res. 2000;70:55–64. doi: 10.1016/S0168-1702(00)00210-0. - DOI - PubMed
    1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. - DOI - PMC - PubMed

Publication types

Associated data

LinkOut - more resources