Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002;3(6):RESEARCH0029.
doi: 10.1186/gb-2002-3-6-research0029. Epub 2002 May 30.

Full-length messenger RNA sequences greatly improve genome annotation

Affiliations

Full-length messenger RNA sequences greatly improve genome annotation

Brian J Haas et al. Genome Biol. 2002.

Abstract

Background: Annotation of eukaryotic genomes is a complex endeavor that requires the integration of evidence from multiple, often contradictory, sources. With the ever-increasing amount of genome sequence data now available, methods for accurate identification of large numbers of genes have become urgently needed. In an effort to create a set of very high-quality gene models, we used the sequence of 5,000 full-length gene transcripts from Arabidopsis to re-annotate its genome. We have mapped these transcripts to their exact chromosomal locations and, using alignment programs, have created gene models that provide a reference set for this organism.

Results: Approximately 35% of the transcripts indicated that previously annotated genes needed modification, and 5% of the transcripts represented newly discovered genes. We also discovered that multiple transcription initiation sites appear to be much more common than previously known, and we report numerous cases of alternative mRNA splicing. We include a comparison of different alignment software and an analysis of how the transcript data improved the previously published annotation.

Conclusions: Our results demonstrate that sequencing of large numbers of full-length transcripts followed by computational mapping greatly improves identification of the complete exon structures of eukaryotic genes. In addition, we are able to find numerous introns in the untranslated regions of the genes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
An example showing how a micro-exon improves a cDNA alignment. (a) Alignment showing the boundaries of the fourth and fifth exons from the sim4 alignment of cDNA Ceres:20761 to chromosome 4. (b) The improvement resulting from insertion of a micro-exon; all three exons now align with 100% identity to the cDNA sequence. Intron positions are shown by '>' in the alignment.
Figure 2
Figure 2
Alternative splice variants discovered by cDNA alignments. Red bars indicate the protein-coding portion of each exon. Black bars indicate noncoding exons and the UTR portions of the initial and terminal exons. Exon boundaries that line up exactly between two or more cDNAs are highlighted in blue. Thin lines connecting the exons represent introns. The genes involved are: (a) auxin-regulated protein, At2g20820, chromosome (chr) 2; (b) SKP1-interacting partner 5 (SKIP5), At3g54480, chr 3; (c) acidic ribosomal protein, At1g01100, chr 1; (d) auxin-regulated protein, At5g53860, chr 5; (e) unknown expressed protein, At2g45740, chr 2.
Figure 3
Figure 3
Comparison of the lengths of the 941 cDNAs from the clones that are contained in both the Ceres and RIKEN collections. (a) Comparison of the 5'-end difference between Ceres and RIKEN clones; (b) comparison of the 3'-end difference between Ceres and RIKEN clones. Peak height indicates the percentage of sequences with a length difference as indicated along the horizontal axis. Positive values on the horizontal axis correspond to longer Ceres clones, while negative values correspond to longer RIKEN clones.

Similar articles

Cited by

References

    1. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, et al. The sequence of the human genome. Science. 2001;291:1304–1351. - PubMed
    1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. - PubMed
    1. Waterston R, Sulston J. The genome of Caenorhabditis elegans. Proc Natl Acad Sci USA. 1995;92:10836–10840. - PMC - PubMed
    1. Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, et al. The genome sequence of Drosophila melanogaster. Science. 2000;287:2185–2195. - PubMed
    1. The Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408:796–815. - PubMed

Publication types