Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Apr 18:14:263.
doi: 10.1186/1471-2164-14-263.

Transcriptome characterisation of Pinus tabuliformis and evolution of genes in the Pinus phylogeny

Affiliations

Transcriptome characterisation of Pinus tabuliformis and evolution of genes in the Pinus phylogeny

Shi-Hui Niu et al. BMC Genomics. .

Abstract

Background: The Chinese pine (Pinus tabuliformis) is an indigenous conifer species in northern China but is relatively underdeveloped as a genomic resource; thus, limiting gene discovery and breeding. Large-scale transcriptome data were obtained using a next-generation sequencing platform to compensate for the lack of P. tabuliformis genomic information.

Results: The increasing amount of transcriptome data on Pinus provides an excellent resource for multi-gene phylogenetic analysis and studies on how conserved genes and functions are maintained in the face of species divergence. The first P. tabuliformis transcriptome from a normalised cDNA library of multiple tissues and individuals was sequenced in a full 454 GS-FLX run, producing 911,302 sequencing reads. The high quality overlapping expressed sequence tags (ESTs) were assembled into 46,584 putative transcripts, and more than 700 SSRs and 92,000 SNPs/InDels were characterised. Comparative analysis of the transcriptome of six conifer species yielded 191 orthologues, from which we inferred a phylogenetic tree, evolutionary patterns and calculated rates of gene diversion. We also identified 938 fast evolving sequences that may be useful for identifying genes that perhaps evolved in response to positive selection and might be responsible for speciation in the Pinus lineage.

Conclusions: A large collection of high-quality ESTs was obtained, de novo assembled and characterised, which represents a dramatic expansion of the current transcript catalogues of P. tabuliformis and which will gradually be applied in breeding programs of P. tabuliformis. Furthermore, these data will facilitate future studies of the comparative genomics of P. tabuliformis and other related species.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of Pinus tabuliformis transcriptome sequencing and assembly. (a) Frequency distribution of 454 sequencing read lengths after filtering and trimming adapters. (b) Length distributions for isotigs and singlets following the de novo assembly process. The abscissa has been truncated at 2 kb. The longest isotig was 3,537 base pairs. An isotig is meant to be analogous to an individual transcript. (c) The average read-depth coverage for assembled unigenes. The y-axis label refers to the total length of all unigenes with the same read-depth coverage. Coverage values from 50 to 6,765 have been binned together. The size of the bubble is proportional to the average unigene length at the corresponding read-depth coverage. (d) A density scatter-plot showing the relationship between unigene length and coverage.
Figure 2
Figure 2
Summary and taxonomic source of BLASTx matches to unigenes. Number of unique best BLASTx matches of unigenes grouped by genus. The best matches of the unigenes to Pinaceae sequences accounted for 28.6% of the total.
Figure 3
Figure 3
Gene Ontology (GO) distributions for the Pinus tabuliformis transcriptome. Main functional categories in the biological process, cellular component and molecular functions found in the transcriptome relevant to plant physiology. The abscissa indicates the number of unigenes. Bars represent the numbers of assignments of Pinus tabuliformis proteins with BLASTx matches to each GO term. One unigene may be matched to multiple GO terms.
Figure 4
Figure 4
Distribution of simple sequence repeats (SSRs) in Pinus tabuliformis expressed sequence tags (ESTs). Di-, tri-, tetra-, penta- and hexa-nucleotide repeats were analysed and their frequencies plotted as a function of the repeat number. The upper right histogram shows the distribution of the total number of SSRs in different classes.
Figure 5
Figure 5
Quality of single nucleotide polymorphisms (SNPs) and insertion/deletions (InDels) in Pinus tabuliformis (isotigs. a) Numbers of SNPs and InDels detected per transcript. (b) Frequencies of different substitution types of SNPs. (c) Frequencies of different insertion/deletion types of InDels. (d) Distributions of SNPs and InDels in total transcripts. The x-axis represents the percentage of one SNP/InDel allele in the population.
Figure 6
Figure 6
Functional annotation and divergence between homologs of five pine and one spruce species. The heat map is based on the 191 putatively orthologous transcripts of six species. The homologs were annotated with Gene Ontology (GO) terms. Colours indicate similarity from yellow (highly similar) to red (weakly similar). The “2× average” is an overall measure of how similar the different species are.
Figure 7
Figure 7
Phylogram of the five pine and one spruce species. Phylogram derived using pairwise non-synonymous substitution rates of orthologous transcripts as a distance metric (not from multiple sequence alignments) and the neighbour-joining method [66]. Branch lengths indicate the non-synonymous substitution rates between different species.
Figure 8
Figure 8
Distribution of Ks values of orthologous pairs for identifying speciation events. Data were grouped into bins of 0.02 Ks units for graphing. The upper right graph shows the Ks distribution of the 6,053 pairs of orthologues identified between P. tabuliformis and P. taeda. Given the rate of substitutions/synonymous site per year, the peaks (Picea glauca = 0.1, P. taeda and P. contort = 0.03, P. pinaster and P. sylvestris < 0.01) indicate the speciation time between P. tabuliformis and these species.
Figure 9
Figure 9
Ka/Ks distribution among 6053 homolog pairs of Pinus tabuliformis and P. taeda. The mean Ka/Ks value was 0.63. The solid line shows the threshold of Ka/Ks = 1, whereas the dashed line marks the more conservative threshold of Ka/Ks = 2. Overall 938 orthologous sequences fell above the light solid line and 207 sequences fell above the dashed line.

Similar articles

Cited by

References

    1. Ahuja MR, Neale DB. Evolution of genome size in conifers. Silvae Genet. 2005;54(3):126–137.
    1. Fernandez-Pozo N, Canales J, Guerrero-Fernandez D, Villalobos DP, Diaz-Moreno SM, Bautista R, Flores-Monterroso A, Guevara MA, Perdiguero P, Collada C. EuroPineDB: a high-coverage web database for maritime pine transcriptome. BMC Genomics. 2011;12:366. doi: 10.1186/1471-2164-12-366. - DOI - PMC - PubMed
    1. Cairney J, Zheng L, Cowels A, Hsiao J, Zismann V, Liu J, Ouyang S, Thibaud-Nissen F, Hamilton J, Childs K. Expressed sequence tags from loblolly pine embryos reveal similarities with angiosperm embryogenesis. Plant Mol Biol. 2006;62(4–5):485–501. - PubMed
    1. Li W, Wang X, Li Y. Stability in and correlation between factors influencing genetic quality of seed lots in seed orchard of Pinus tabuliformis Carr. over a 12-year span. PLoS One. 2011;6(8):e23544. doi: 10.1371/journal.pone.0023544. - DOI - PMC - PubMed
    1. Chen K, Abbott RJ, Milne RI, Tian XM, Liu J. Phylogeography of Pinus tabulaeformis Carr. (Pinaceae), a dominant species of coniferous forest in northern China. Mol Ecol. 2008;17(19):4276–4288. doi: 10.1111/j.1365-294X.2008.03911.x. - DOI - PubMed

Publication types

Substances

LinkOut - more resources