Efficient plant gene identification based on interspecies mapping of full-length cDNAs
- PMID: 20668003
- PMCID: PMC2955710
- DOI: 10.1093/dnares/dsq017
Efficient plant gene identification based on interspecies mapping of full-length cDNAs
Abstract
We present an annotation pipeline that accurately predicts exon-intron structures and protein-coding sequences (CDSs) on the basis of full-length cDNAs (FLcDNAs). This annotation pipeline was used to identify genes in 10 plant genomes. In particular, we show that interspecies mapping of FLcDNAs to genomes is of great value in fully utilizing FLcDNA resources whose availability is limited to several species. Because low sequence conservation at 5'- and 3'-ends of FLcDNAs between different species tends to result in truncated CDSs, we developed an improved algorithm to identify complete CDSs by the extension of both ends of truncated CDSs. Interspecies mapping of 71 801 monocot FLcDNAs to the Oryza sativa genome led to the detection of 22 142 protein-coding regions. Moreover, in comparing two mapping programs and three ab initio prediction programs, we found that our pipeline was more capable of identifying complete CDSs. As demonstrated by monocot interspecies mapping, in which nucleotide identity between FLcDNAs and the genome was ∼80%, the resultant inferred CDSs were sufficiently accurate. Finally, we applied both inter- and intraspecies mapping to 10 monocot and dicot genomes and identified genes in 210 551 loci. Interspecies mapping of FLcDNAs is expected to effectively predict genes and CDSs in newly sequenced genomes.
Figures





Similar articles
-
Analysis of 4,664 high-quality sequence-finished poplar full-length cDNA clones and their utility for the discovery of genes responding to insect feeding.BMC Genomics. 2008 Jan 29;9:57. doi: 10.1186/1471-2164-9-57. BMC Genomics. 2008. PMID: 18230180 Free PMC article.
-
A conifer genomics resource of 200,000 spruce (Picea spp.) ESTs and 6,464 high-quality, sequence-finished full-length cDNAs for Sitka spruce (Picea sitchensis).BMC Genomics. 2008 Oct 14;9:484. doi: 10.1186/1471-2164-9-484. BMC Genomics. 2008. PMID: 18854048 Free PMC article.
-
Sequencing, mapping, and analysis of 27,455 maize full-length cDNAs.PLoS Genet. 2009 Nov;5(11):e1000740. doi: 10.1371/journal.pgen.1000740. Epub 2009 Nov 20. PLoS Genet. 2009. PMID: 19936069 Free PMC article.
-
Evidence-based gene predictions in plant genomes.Genome Res. 2009 Oct;19(10):1912-23. doi: 10.1101/gr.088997.108. Epub 2009 Jun 18. Genome Res. 2009. PMID: 19541913 Free PMC article.
-
Current challenges in de novo plant genome sequencing and assembly.Genome Biol. 2012;13(4):243. doi: 10.1186/gb4015. Genome Biol. 2012. PMID: 22546054 Free PMC article. Review.
Cited by
-
A systems biology approach uncovers a gene co-expression network associated with cell wall degradability in maize.PLoS One. 2019 Dec 31;14(12):e0227011. doi: 10.1371/journal.pone.0227011. eCollection 2019. PLoS One. 2019. PMID: 31891625 Free PMC article.
-
Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics.Plant Cell Physiol. 2013 Feb;54(2):e6. doi: 10.1093/pcp/pcs183. Epub 2013 Jan 7. Plant Cell Physiol. 2013. PMID: 23299411 Free PMC article.
-
TriAnnot: A Versatile and High Performance Pipeline for the Automated Annotation of Plant Genomes.Front Plant Sci. 2012 Jan 31;3:5. doi: 10.3389/fpls.2012.00005. eCollection 2012. Front Plant Sci. 2012. PMID: 22645565 Free PMC article.
-
MEGANTE: a web-based system for integrated plant genome annotation.Plant Cell Physiol. 2014 Jan;55(1):e2. doi: 10.1093/pcp/pct157. Epub 2013 Nov 18. Plant Cell Physiol. 2014. PMID: 24253915 Free PMC article.
References
-
- Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408:796–815. doi:10.1038/35048692. - DOI - PubMed
-
- International Rice Genome Sequencing Project. The map-based sequence of the rice genome. Nature. 2005;436:793–800. doi:10.1038/nature03895. - DOI - PubMed
-
- Paterson A.H., Bowers J.E., Bruggmann R., et al. The Sorghum bicolor genome and the diversification of grasses. Nature. 2009;457:551–6. doi:10.1038/nature07723. - DOI - PubMed
-
- Schnable P.S., Ware D., Fulton R.S., et al. The B73 maize genome: complexity, diversity, and dynamics. Science. 2009;326:1112–5. doi:10.1126/science.1178534. - DOI - PubMed
-
- The International Brachypodium Initiative. Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature. 2010;463:763–8. doi:10.1038/nature08747. - DOI - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Research Materials