. 2007 Oct 25:8:388.

doi: 10.1186/1471-2164-8-388.

EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome

Françoise Thibaud-Nissen¹, Matthew Campbell, John P Hamilton, Wei Zhu, C Robin Buell

Affiliations

PMID: 17961238
PMCID: PMC2151081
DOI: 10.1186/1471-2164-8-388

EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome

Françoise Thibaud-Nissen et al. BMC Genomics. 2007.

. 2007 Oct 25:8:388.

doi: 10.1186/1471-2164-8-388.

Authors

Françoise Thibaud-Nissen¹, Matthew Campbell, John P Hamilton, Wei Zhu, C Robin Buell

Affiliation

¹ The Institute for Genomic Research, 9712 Medical Center Dr, Rockville, MD 20850, USA. fthibaud@jcvi.org

PMID: 17961238
PMCID: PMC2151081
DOI: 10.1186/1471-2164-8-388

Abstract

Background: Despite the improvements of tools for automated annotation of genome sequences, manual curation at the structural and functional level can provide an increased level of refinement to genome annotation. The Institute for Genomic Research Rice Genome Annotation (hereafter named the Osa1 Genome Annotation) is the product of an automated pipeline and, for this reason, will benefit from the input of biologists with expertise in rice and/or particular gene families. Leveraging knowledge from a dispersed community of scientists is a demonstrated way of improving a genome annotation. This requires tools that facilitate 1) the submission of gene annotation to an annotation project, 2) the review of the submitted models by project annotators, and 3) the incorporation of the submitted models in the ongoing annotation effort.

Results: We have developed the Eukaryotic Community Annotation Package (EuCAP), an annotation tool, and have applied it to the rice genome. The primary level of curation by community annotators (CA) has been the annotation of gene families. Annotation can be submitted by email or through the EuCAP Web Tool. The CA models are aligned to the rice pseudomolecules and the coordinates of these alignments, along with functional annotation, are stored in the MySQL EuCAP Gene Model database. Web pages displaying the alignments of the CA models to the Osa1 Genome models are automatically generated from the EuCAP Gene Model database. The alignments are reviewed by the project annotators (PAs) in the context of experimental evidence. Upon approval by the PAs, the CA models, along with the corresponding functional annotations, are integrated into the Osa1 Genome Annotation. The CA annotations, grouped by family, are displayed on the Community Annotation pages of the project website http://rice.tigr.org, as well as in the Community Annotation track of the Genome Browser.

Conclusion: We have applied EuCAP to rice. As of July 2007, the structural and/or functional annotation of 1,094 genes representing 57 families have been deposited and integrated into the current gene set. All of the EuCAP components are open-source, thereby allowing the implementation of EuCAP for the annotation of other genomes. EuCAP is available at http://sourceforge.net/projects/eucap/.

PubMed Disclaimer

Figures

**Figure 1**
**Flow of information through the EuCAP pipeline**. Annotations are submitted through the EuCAP Web Tool or emailed, formatted, and loaded into the EuCAP Gene Model database. Using information from the Osa1 GFF database (containing feature coordinates of Osa1 models) and the EuCAP Gene Model database, a customized web page is generated for each family. After review of the model by the PA and approval of the pages by the CA, the web pages are made public, the CA models are added to the Genome Browser and used to update the Osa1 Genome Annotation database.

**Figure 2**
**Online form for the submission of structural annotation through the EuCAP Web Tool**. The Osa1 annotation (light blue), the submitted model (green), full-length cDNAs (dark blue) and rice Transcript Assemblies (purple) are shown in the viewer. Coordinates of the features can be displayed by mousing over the features. Exon and exon coordinates of the CA model can be modified in the fields at the bottom of the page

**Figure 3**
**An example of a Community Annotation web page**. The page, titled after the family annotated (adenylyl sulfate reductase) contains the contact information of the CA, a reference for the annotation, and a summary of the method used for the annotation. For each CA model, the locus is linked to the Rice Genome Browser and the alignment of the CA (in green) and the Osa1 models (in blue) is shown. Gene name and function provided by the CA, accession(s) for the genomic, cDNA and/or protein sequence(s) and other information of interest are listed and hyperlinked to further web pages.

**Figure 4**
**An example of a gene "missed" by the automated annotation pipeline**. The current annotation is represented in the TIGR Rice Gene Model track. Due to lack of either a FGENESH prediction, or an EST or FL-cDNA alignment, no model was annotated prior to integration of the CA model (shown in the Community Annotation track). The xy-plot in the sorghum, maize and Arabidopsis tracks represent the percent sequence homology with rice. The CA model is supported by sequence homology of exons in Arabidopsis, maize, and sorghum.

**Figure 5**
**An example of a gene misannotated by the automated annotation pipeline**. The current annotation is represented in the TIGR Rice Gene Model track. This model integrates the CA model shown in the Community Annotation track and replaces the previous (release 4) annotation corresponding to the FGENESH prediction. The xy-plot in the sorghum, maize and Arabidopsis tracks represent the percent sequence homology with rice. The CA model is supported by sequence homology of exons in Arabidopsis, maize and sorghum. Note the absence of EST or full-length cDNA evidence at this locus.

See this image and copyright information in PMC

Cited by

Genomic and genetic database resources for the grasses.
Childs KL. Childs KL. Plant Physiol. 2009 Jan;149(1):132-6. doi: 10.1104/pp.108.129593. Plant Physiol. 2009. PMID: 19126704 Free PMC article. Review. No abstract available.
Identification and characterization of pseudogenes in the rice gene complement.
Thibaud-Nissen F, Ouyang S, Buell CR. Thibaud-Nissen F, et al. BMC Genomics. 2009 Jul 16;10:317. doi: 10.1186/1471-2164-10-317. BMC Genomics. 2009. PMID: 19607679 Free PMC article.
An improved genome release (version Mt4.0) for the model legume Medicago truncatula.
Tang H, Krishnakumar V, Bidwell S, Rosen B, Chan A, Zhou S, Gentzbittel L, Childs KL, Yandell M, Gundlach H, Mayer KF, Schwartz DC, Town CD. Tang H, et al. BMC Genomics. 2014 Apr 27;15:312. doi: 10.1186/1471-2164-15-312. BMC Genomics. 2014. PMID: 24767513 Free PMC article.

References

1. Brent MR. Genome annotation past, present, and future: how to define an ORF at each locus. Genome Res. 2005;15:1777–1786. doi: 10.1101/gr.3866105. - DOI - PubMed
1. Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK, Jr., Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, Salzberg SL, White O. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31:5654–5666. doi: 10.1093/nar/gkg770. - DOI - PMC - PubMed
1. Haas BJ, Volfovsky N, Town CD, Troukhan M, Alexandrov N, Feldmann KA, Flavell RB, White O, Salzberg SL. Full-length messenger RNA sequences greatly improve genome annotation. Genome Biol. 2002;3:RESEARCH0029. doi: 10.1186/gb-2002-3-6-research0029. - DOI - PMC - PubMed
1. Yuan Q, Ouyang S, Wang A, Zhu W, Maiti R, Lin H, Hamilton J, Haas B, Sultana R, Cheung F, Wortman J, Buell CR. The Institute for Genomic Research Osa1 rice genome annotation database. Plant Physiol. 2005;138:18–26. doi: 10.1104/pp.104.059063. - DOI - PMC - PubMed
1. Yuan Q, Ouyang S, Liu J, Suh B, Cheung F, Sultana R, Lee D, Quackenbush J, Buell CR. The TIGR rice genome annotation resource: annotating the rice genome and creating resources for plant biologists. Nucleic Acids Res. 2003;31:229–233. doi: 10.1093/nar/gkg059. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome

Affiliation

EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Research Materials

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Research Materials