Full-length messenger RNA sequences greatly improve genome annotation
- PMID: 12093376
- PMCID: PMC116726
- DOI: 10.1186/gb-2002-3-6-research0029
Full-length messenger RNA sequences greatly improve genome annotation
Abstract
Background: Annotation of eukaryotic genomes is a complex endeavor that requires the integration of evidence from multiple, often contradictory, sources. With the ever-increasing amount of genome sequence data now available, methods for accurate identification of large numbers of genes have become urgently needed. In an effort to create a set of very high-quality gene models, we used the sequence of 5,000 full-length gene transcripts from Arabidopsis to re-annotate its genome. We have mapped these transcripts to their exact chromosomal locations and, using alignment programs, have created gene models that provide a reference set for this organism.
Results: Approximately 35% of the transcripts indicated that previously annotated genes needed modification, and 5% of the transcripts represented newly discovered genes. We also discovered that multiple transcription initiation sites appear to be much more common than previously known, and we report numerous cases of alternative mRNA splicing. We include a comparison of different alignment software and an analysis of how the transcript data improved the previously published annotation.
Conclusions: Our results demonstrate that sequencing of large numbers of full-length transcripts followed by computational mapping greatly improves identification of the complete exon structures of eukaryotic genes. In addition, we are able to find numerous introns in the untranslated regions of the genes.
Figures



Similar articles
-
Features of Arabidopsis genes and genome discovered using full-length cDNAs.Plant Mol Biol. 2006 Jan;60(1):69-85. doi: 10.1007/s11103-005-2564-9. Plant Mol Biol. 2006. PMID: 16463100
-
Mining Arabidopsis thaliana RNA-seq data with Integrated Genome Browser reveals stress-induced alternative splicing of the putative splicing regulator SR45a.Am J Bot. 2012 Feb;99(2):219-31. doi: 10.3732/ajb.1100355. Epub 2012 Jan 30. Am J Bot. 2012. PMID: 22291167
-
Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies.Nucleic Acids Res. 2003 Oct 1;31(19):5654-66. doi: 10.1093/nar/gkg770. Nucleic Acids Res. 2003. PMID: 14500829 Free PMC article.
-
Genome annotation: which tools do we have for it?Curr Opin Plant Biol. 1999 Apr;2(2):90-5. doi: 10.1016/S1369-5266(99)80019-3. Curr Opin Plant Biol. 1999. PMID: 10322203 Review.
-
EGASP: the human ENCODE Genome Annotation Assessment Project.Genome Biol. 2006;7 Suppl 1(Suppl 1):S2.1-31. doi: 10.1186/gb-2006-7-s1-s2. Epub 2006 Aug 7. Genome Biol. 2006. PMID: 16925836 Free PMC article. Review.
Cited by
-
A RING-H2 zinc-finger protein gene RIE1 is essential for seed development in Arabidopsis.Plant Mol Biol. 2003 Sep;53(1-2):37-50. doi: 10.1023/b:plan.0000009256.01620.a6. Plant Mol Biol. 2003. PMID: 14756305
-
Whole genome shotgun sequencing of Brassica oleracea and its application to gene discovery and annotation in Arabidopsis.Genome Res. 2005 Apr;15(4):487-95. doi: 10.1101/gr.3176505. Genome Res. 2005. PMID: 15805490 Free PMC article.
-
Functional genomic analysis of Arabidopsis thaliana glycoside hydrolase family 1.Plant Mol Biol. 2004 May;55(3):343-67. doi: 10.1007/s11103-004-0790-1. Plant Mol Biol. 2004. PMID: 15604686
-
Apollo: a sequence annotation editor.Genome Biol. 2002;3(12):RESEARCH0082. doi: 10.1186/gb-2002-3-12-research0082. Epub 2002 Dec 23. Genome Biol. 2002. PMID: 12537571 Free PMC article. Review.
-
Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome.Genome Biol. 2002;3(12):RESEARCH0086. doi: 10.1186/gb-2002-3-12-research0086. Epub 2002 Dec 30. Genome Biol. 2002. PMID: 12537575 Free PMC article.
References
-
- Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, et al. The sequence of the human genome. Science. 2001;291:1304–1351. - PubMed
-
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. - PubMed
-
- Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, et al. The genome sequence of Drosophila melanogaster. Science. 2000;287:2185–2195. - PubMed
-
- The Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408:796–815. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases