Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Oct 25:8:388.
doi: 10.1186/1471-2164-8-388.

EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome

Affiliations

EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome

Françoise Thibaud-Nissen et al. BMC Genomics. .

Abstract

Background: Despite the improvements of tools for automated annotation of genome sequences, manual curation at the structural and functional level can provide an increased level of refinement to genome annotation. The Institute for Genomic Research Rice Genome Annotation (hereafter named the Osa1 Genome Annotation) is the product of an automated pipeline and, for this reason, will benefit from the input of biologists with expertise in rice and/or particular gene families. Leveraging knowledge from a dispersed community of scientists is a demonstrated way of improving a genome annotation. This requires tools that facilitate 1) the submission of gene annotation to an annotation project, 2) the review of the submitted models by project annotators, and 3) the incorporation of the submitted models in the ongoing annotation effort.

Results: We have developed the Eukaryotic Community Annotation Package (EuCAP), an annotation tool, and have applied it to the rice genome. The primary level of curation by community annotators (CA) has been the annotation of gene families. Annotation can be submitted by email or through the EuCAP Web Tool. The CA models are aligned to the rice pseudomolecules and the coordinates of these alignments, along with functional annotation, are stored in the MySQL EuCAP Gene Model database. Web pages displaying the alignments of the CA models to the Osa1 Genome models are automatically generated from the EuCAP Gene Model database. The alignments are reviewed by the project annotators (PAs) in the context of experimental evidence. Upon approval by the PAs, the CA models, along with the corresponding functional annotations, are integrated into the Osa1 Genome Annotation. The CA annotations, grouped by family, are displayed on the Community Annotation pages of the project website http://rice.tigr.org, as well as in the Community Annotation track of the Genome Browser.

Conclusion: We have applied EuCAP to rice. As of July 2007, the structural and/or functional annotation of 1,094 genes representing 57 families have been deposited and integrated into the current gene set. All of the EuCAP components are open-source, thereby allowing the implementation of EuCAP for the annotation of other genomes. EuCAP is available at http://sourceforge.net/projects/eucap/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flow of information through the EuCAP pipeline. Annotations are submitted through the EuCAP Web Tool or emailed, formatted, and loaded into the EuCAP Gene Model database. Using information from the Osa1 GFF database (containing feature coordinates of Osa1 models) and the EuCAP Gene Model database, a customized web page is generated for each family. After review of the model by the PA and approval of the pages by the CA, the web pages are made public, the CA models are added to the Genome Browser and used to update the Osa1 Genome Annotation database.
Figure 2
Figure 2
Online form for the submission of structural annotation through the EuCAP Web Tool. The Osa1 annotation (light blue), the submitted model (green), full-length cDNAs (dark blue) and rice Transcript Assemblies (purple) are shown in the viewer. Coordinates of the features can be displayed by mousing over the features. Exon and exon coordinates of the CA model can be modified in the fields at the bottom of the page
Figure 3
Figure 3
An example of a Community Annotation web page. The page, titled after the family annotated (adenylyl sulfate reductase) contains the contact information of the CA, a reference for the annotation, and a summary of the method used for the annotation. For each CA model, the locus is linked to the Rice Genome Browser and the alignment of the CA (in green) and the Osa1 models (in blue) is shown. Gene name and function provided by the CA, accession(s) for the genomic, cDNA and/or protein sequence(s) and other information of interest are listed and hyperlinked to further web pages.
Figure 4
Figure 4
An example of a gene "missed" by the automated annotation pipeline. The current annotation is represented in the TIGR Rice Gene Model track. Due to lack of either a FGENESH prediction, or an EST or FL-cDNA alignment, no model was annotated prior to integration of the CA model (shown in the Community Annotation track). The xy-plot in the sorghum, maize and Arabidopsis tracks represent the percent sequence homology with rice. The CA model is supported by sequence homology of exons in Arabidopsis, maize, and sorghum.
Figure 5
Figure 5
An example of a gene misannotated by the automated annotation pipeline. The current annotation is represented in the TIGR Rice Gene Model track. This model integrates the CA model shown in the Community Annotation track and replaces the previous (release 4) annotation corresponding to the FGENESH prediction. The xy-plot in the sorghum, maize and Arabidopsis tracks represent the percent sequence homology with rice. The CA model is supported by sequence homology of exons in Arabidopsis, maize and sorghum. Note the absence of EST or full-length cDNA evidence at this locus.

Similar articles

Cited by

References

    1. Brent MR. Genome annotation past, present, and future: how to define an ORF at each locus. Genome Res. 2005;15:1777–1786. doi: 10.1101/gr.3866105. - DOI - PubMed
    1. Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK, Jr., Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, Salzberg SL, White O. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31:5654–5666. doi: 10.1093/nar/gkg770. - DOI - PMC - PubMed
    1. Haas BJ, Volfovsky N, Town CD, Troukhan M, Alexandrov N, Feldmann KA, Flavell RB, White O, Salzberg SL. Full-length messenger RNA sequences greatly improve genome annotation. Genome Biol. 2002;3:RESEARCH0029. doi: 10.1186/gb-2002-3-6-research0029. - DOI - PMC - PubMed
    1. Yuan Q, Ouyang S, Wang A, Zhu W, Maiti R, Lin H, Hamilton J, Haas B, Sultana R, Cheung F, Wortman J, Buell CR. The Institute for Genomic Research Osa1 rice genome annotation database. Plant Physiol. 2005;138:18–26. doi: 10.1104/pp.104.059063. - DOI - PMC - PubMed
    1. Yuan Q, Ouyang S, Liu J, Suh B, Cheung F, Sultana R, Lee D, Quackenbush J, Buell CR. The TIGR rice genome annotation resource: annotating the rice genome and creating resources for plant biologists. Nucleic Acids Res. 2003;31:229–233. doi: 10.1093/nar/gkg059. - DOI - PMC - PubMed

Publication types