Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2004 Mar;14(3):406-13.
doi: 10.1101/gr.1515604.

Whole genome sequence comparisons and "full-length" cDNA sequences: a combined approach to evaluate and improve Arabidopsis genome annotation

Affiliations
Comparative Study

Whole genome sequence comparisons and "full-length" cDNA sequences: a combined approach to evaluate and improve Arabidopsis genome annotation

Vanina Castelli et al. Genome Res. 2004 Mar.

Abstract

To evaluate the existing annotation of the Arabidopsis genome further, we generated a collection of evolutionary conserved regions (ecores) between Arabidopsis and rice. The ecore analysis provides evidence that the gene catalog of Arabidopsis is not yet complete, and that a number of these annotations require re-examination. To improve the Arabidopsis genome annotation further, we used a novel "full-length" enriched cDNA collection prepared from several tissues. An additional 1931 genes were covered by new "full-length" cDNA sequences, raising the number of annotated genes with a corresponding "full-length" cDNA sequence to about 14,000. Detailed comparisons between these "full-length" cDNA sequences and annotated genes show that this resource is very helpful in determining the correct structure of genes, in particular, those not yet supported by "full-length" cDNAs. In addition, a total of 326 genomic regions not included previously in the Arabidopsis genome annotation were detected by this cDNA resource, providing clues for new gene discovery. Because, as expected, the two data sets only partially overlap, their combination produces very useful information for improving the Arabidopsis genome annotation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Graphical description of the different situations observed when comparing annotated genes with ecores/ecotigs/cDNAs. (Case 1) Missing internal exon detected by ecores or cDNA sequences. In some cases, the internal exon is only partly missing. (Case 2) Extension of an annotated gene. (Case 3) Novel gene. (Case 4) A cDNA overlapping (red) or partially overlapping (gray) an annotated gene. (Case 5) cDNAs included (red) or not (gray) in the evaluation set. (Case 6) CDS annotation extending beyond GSLT CDS. (Case 7) On the right, cDNA bridging two annotated genes, on the left, cDNAs splitting an annotated gene.
Figure 2
Figure 2
Comparison of the length of GenBank and GSLT cDNA. 5′ (green) 3′ (red). Positive abscissa values correspond to cases in which the GSLT cDNA extends the E-A-mRNA resource, whereas negative values correspond to longer E-A-mRNA cDNAs. The Y axis corresponds to the number of cases found at a given X value.
Figure 3
Figure 3
An example of 5′ extension detected by the GSTL resource. In this example, the gene structure of At3g58760 can also be corrected for a missing exon located between exons 6 and 7 of the annotated gene, due to longer cDNAs present in the GSLT resource.
Figure 4
Figure 4
The gene structure of At3g61860 was confirmed by three E-A-mRNA cDNAs. In the GSLT resource, we found three other cDNAs corresponding to a different gene structure (only one of each is represented). The difference between gene models is caused by the usage of an alternative 3′ acceptor site for intron 1.
Figure 5
Figure 5
The At4g21215 gene structure is confirmed by an E-A-mRNA cDNA. A cDNA from the GSLT resource leads to another gene structure, due to the presence of a supplementary intron.
Figure 6
Figure 6
A 3′ extension of an annotated gene model, detected by both an ecotig and a GSLT cDNA sequence (At1g20100).
Figure 7
Figure 7
Novel gene detected by an ecotig and a GSLT cDNA sequence (GSLTF85ZE11, accession no. BX819512).

References

    1. Arabidopsis Genome Initiative (AGI) 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815. - PubMed
    1. Bonaldo, M.F., Lennon, G., and Soares, M.B. 1996. Normalization and subtraction: Two approaches to facilitate gene discovery. Genome Res. 6: 791–806. - PubMed
    1. Cock, J.M. and McCormick, S. 2001. A large family of genes that share homology with CLAVATA3. Plant Physiol. 126: 939–942. - PMC - PubMed
    1. Florea, L., Hartzell, G., Zhang, Z., Rubin, G.M., and Miller, W. 1998. A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res. 8: 967–974. - PMC - PubMed
    1. Goff, S.A., Ricke, D., Lan, T.H., Presting, G., Wang, R., Dunn, M., Glazebrook, J., Sessions, A., Oeller, P., Varma, H., et al. 2002. A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296: 92–100. - PubMed

WEB SITE REFERENCES

    1. http://www.genoscope.cns.fr/; gives direct access to the browser.
    1. http://www.invitrogen.com/content/sfs/manuals/18248.pdf; contains protocol used for libraries construction.
    1. http://www.genoscope.cns.fr/Arabidopsis; permits access to files listed in the text, with links to the browser.
    1. http://rgp.dna.affrc.go.jp/IRGSP/; The International Rice Genome Sequencing Project home page.