Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Mar 16:11:180.
doi: 10.1186/1471-2164-11-180.

Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery

Affiliations

Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery

Thomas L Parchman et al. BMC Genomics. .

Abstract

Background: Massively parallel sequencing of cDNA is now an efficient route for generating enormous sequence collections that represent expressed genes. This approach provides a valuable starting point for characterizing functional genetic variation in non-model organisms, especially where whole genome sequencing efforts are currently cost and time prohibitive. The large and complex genomes of pines (Pinus spp.) have hindered the development of genomic resources, despite the ecological and economical importance of the group. While most genomic studies have focused on a single species (P. taeda), genomic level resources for other pines are insufficiently developed to facilitate ecological genomic research. Lodgepole pine (P. contorta) is an ecologically important foundation species of montane forest ecosystems and exhibits substantial adaptive variation across its range in western North America. Here we describe a sequencing study of expressed genes from P. contorta, including their assembly and annotation, and their potential for molecular marker development to support population and association genetic studies.

Results: We obtained 586,732 sequencing reads from a 454 GS XLR70 Titanium pyrosequencer (mean length: 306 base pairs). A combination of reference-based and de novo assemblies yielded 63,657 contigs, with 239,793 reads remaining as singletons. Based on sequence similarity with known proteins, these sequences represent approximately 17,000 unique genes, many of which are well covered by contig sequences. This sequence collection also included a surprisingly large number of retrotransposon sequences, suggesting that they are highly transcriptionally active in the tissues we sampled. We located and characterized thousands of simple sequence repeats and single nucleotide polymorphisms as potential molecular markers in our assembled and annotated sequences. High quality PCR primers were designed for a substantial number of the SSR loci, and a large number of these were amplified successfully in initial screening.

Conclusions: This sequence collection represents a major genomic resource for P. contorta, and the large number of genetic markers characterized should contribute to future research in this and other pines. Our results illustrate the utility of next generation sequencing as a basis for marker development and population genomics in non-model species.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Frequency distribution of 454 sequencing read lengths. The frequency distribution of read lengths resulting from 454 GS XLR70 Titanium pyrosequencing.
Figure 2
Figure 2
Schematic of 454 EST analysis. The steps and sets of sequences involved in 454 EST sequencing, assembly of reads into contigs, annotation using protein databases, and genetic marker discovery and characterization.
Figure 3
Figure 3
Assembly characteristics of 454 ESTs. Histograms depicting the number of reads per contig and contig lengths obtained from a reference-guided assembly (left) and a de novo assembly of the remaining reads (right).
Figure 4
Figure 4
Contig length as a function of the number of sequences assembled into each contig. The marginal histograms depict the frequency distributions of the number of sequences assembled into contigs and the frequency distribution of contig length.
Figure 5
Figure 5
Gene ontology assignments for P. contorta and A. thaliana. Proportion of annotated contigs and singletons from P. contorta 454 ESTs and annotated A. thaliana proteins that matched various gene ontology (GO) categories.
Figure 6
Figure 6
Comparison of P. contorta contigs to orthologous A. thaliana coding sequences. A. The ratio of P. contorta contig length to A. thaliana ortholog length as a function of contig coverage depth. The dotted line corresponds to a ratio of one, where 454 contigs are as long or longer than the BLAST matched A. thaliana orthologs. The contour lines on both plots correspond to the density of points in the plot. B. Total percent of A. thaliana ortholog coding sequence that was covered by all P. contorta contigs, as a function of the length of the A. thaliana coding sequence.

Similar articles

Cited by

References

    1. Stinchcombe JR, Hoekstra HE. Combining population genomics and quantitative genetics: finding the genes underlying ecologically important traits. Heredity. 2007;100:158–170. doi: 10.1038/sj.hdy.6800937. - DOI - PubMed
    1. Bouck A, Vision T. The molecular ecologist's guide to expressed sequence tags. Molecular Ecology. 2007;16(5):907–924. doi: 10.1111/j.1365-294X.2006.03195.x. - DOI - PubMed
    1. Andersen JR, Lubberstedt T. Functional markers in plants. Trends in Plant Science. 2003;8(11):554–560. doi: 10.1016/j.tplants.2003.09.010. - DOI - PubMed
    1. Emrich SJ, Barbazuk WB, Li L, Schnable PS. Gene discovery and annotation using LCM-454 transcriptome sequencing. Genome Research. 2007;17:69–73. doi: 10.1101/gr.5145806. - DOI - PMC - PubMed
    1. Vera JC, Wheat CW, Fescemyer HW, Frilander MJ, Crawford DL, Hanski I, Marden JH. Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing. Molecular Ecology. 2008;17(7):1636–1647. doi: 10.1111/j.1365-294X.2008.03666.x. - DOI - PubMed

Publication types

LinkOut - more resources