Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jun 30:9:312.
doi: 10.1186/1471-2164-9-312.

High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome

Affiliations

High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome

Evandro Novaes et al. BMC Genomics. .

Abstract

Background: Benefits from high-throughput sequencing using 454 pyrosequencing technology may be most apparent for species with high societal or economic value but few genomic resources. Rapid means of gene sequence and SNP discovery using this novel sequencing technology provide a set of baseline tools for genome-level research. However, it is questionable how effective the sequencing of large numbers of short reads for species with essentially no prior gene sequence information will support contig assemblies and sequence annotation.

Results: With the purpose of generating the first broad survey of gene sequences in Eucalyptus grandis, the most widely planted hardwood tree species, we used 454 technology to sequence and assemble 148 Mbp of expressed sequences (EST). EST sequences were generated from a normalized cDNA pool comprised of multiple tissues and genotypes, promoting discovery of homologues to almost half of Arabidopsis genes, and a comprehensive survey of allelic variation in the transcriptome. By aligning the sequencing reads from multiple genotypes we detected 23,742 SNPs, 83% of which were validated in a sample. Genome-wide nucleotide diversity was estimated for 2,392 contigs using a modified theta (theta) parameter, adapted for measuring genetic diversity from polymorphisms detected by randomly sequencing a multi-genotype cDNA pool. Diversity estimates in non-synonymous nucleotides were on average 4x smaller than in synonymous, suggesting purifying selection. Non-synonymous to synonymous substitutions (Ka/Ks) among 2,001 contigs averaged 0.30 and was skewed to the right, further supporting that most genes are under purifying selection. Comparison of these estimates among contigs identified major functional classes of genes under purifying and diversifying selection in agreement with previous researches.

Conclusion: In providing an abundance of foundational transcript sequences where limited prior genomic information existed, this work created part of the foundation for the annotation of the E. grandis genome that is being sequenced by the US Department of Energy. In addition we demonstrated that SNPs sampled in large-scale with 454 pyrosequencing can be used to detect evolutionary signatures among genes, providing one of the first genome-wide assessments of nucleotide diversity and Ka/Ks for a non-model plant species.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Proportion of E. grandis unigenes with homology to gene models. Proportion of E. grandis unigenes (contigs + singlets) without (-) and with homology to the Arabidopsis (A), Populus (P), Oryza (O) gene models. (a) Effect of the sequence length on the proportion of homology to gene models (E value 10-5). (b) Proportion of E. grandis unigenes longer than 100 bp with and without homology to gene models at three different E values (10-5, 10-10 and 10-20).
Figure 2
Figure 2
Proportion of GO categories found in the E. grandis unigenes. Proportion of categories of each Gene Ontology (GO) sampled by the E. grandis unigene sequences compared with the proportions found in the Arabidopsis genome annotation.

References

    1. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer ML, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, Lohman KL, Lu H, Makhijani VB, McDade KE, McKenna MP, Myers EW, Nickerson E, Nobile JR, Plant R, Puc BP, Ronan MT, Roth GT, Sarkis GJ, Simons JF, Simpson JW, Srinivasan M, Tartaro KR, Tomasz A, Vogt KA, Volkmer GA, Wang SH, Wang Y, Weiner MP, Yu P, Begley RF, Rothberg JM. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376–380. - PMC - PubMed
    1. Emrich SJ, Barbazuk WB, Li L, Schnable PS. Gene discovery and annotation using LCM-454 transcriptome sequencing. Genome Res. 2007;17:69–73. doi: 10.1101/gr.5145806. - DOI - PMC - PubMed
    1. Ohtsu K, Smith MB, Emrich SJ, Borsuk LA, Zhou R, Chen T, Zhang X, Timmermans MC, Beck J, Buckner B, Janick-Buckner D, Nettleton D, Scanlon MJ, Schnable PS. Global gene expression analysis of the shoot apical meristem of maize (Zea mays L.) Plant J. 2007;52:391–404. doi: 10.1111/j.1365-313X.2007.03244.x. - DOI - PMC - PubMed
    1. Cheung F, Haas BJ, Goldberg SM, May GD, Xiao Y, Town CD. Sequencing Medicago truncatula expressed sequenced tags using 454 Life Sciences technology. BMC Genomics. 2006;7:272. doi: 10.1186/1471-2164-7-272. - DOI - PMC - PubMed
    1. Jones-Rhoades MW, Borevitz JO, Preuss D. Genome-wide expression profiling of the Arabidopsis female gametophyte identifies families of small, secreted proteins. PLoS Genet. 2007;3:1848–1861. doi: 10.1371/journal.pgen.0030171. - DOI - PMC - PubMed

Publication types

Substances