Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Mar;196(3):891-909.
doi: 10.1534/genetics.113.159996.

Unique features of the loblolly pine (Pinus taeda L.) megagenome revealed through sequence annotation

Affiliations

Unique features of the loblolly pine (Pinus taeda L.) megagenome revealed through sequence annotation

Jill L Wegrzyn et al. Genetics. 2014 Mar.

Abstract

The largest genus in the conifer family Pinaceae is Pinus, with over 100 species. The size and complexity of their genomes (∼20-40 Gb, 2n = 24) have delayed the arrival of a well-annotated reference sequence. In this study, we present the annotation of the first whole-genome shotgun assembly of loblolly pine (Pinus taeda L.), which comprises 20.1 Gb of sequence. The MAKER-P annotation pipeline combined evidence-based alignments and ab initio predictions to generate 50,172 gene models, of which 15,653 are classified as high confidence. Clustering these gene models with 13 other plant species resulted in 20,646 gene families, of which 1554 are predicted to be unique to conifers. Among the conifer gene families, 159 are composed exclusively of loblolly pine members. The gene models for loblolly pine have the highest median and mean intron lengths of 24 fully sequenced plant genomes. Conifer genomes are full of repetitive DNA, with the most significant contributions from long-terminal-repeat retrotransposons. In depth analysis of the tandem and interspersed repetitive content yielded a combined estimate of 82%.

Keywords: conifer; gene family; introns; repeats; retrotransposons.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Orthologous proteins derived from PLAZA and mapped to the loblolly pine genome (1.01) at various similarity scores. Also included are proteins based on Picea sitenchis sequence from GenBank, P. abies proteins from the Congenie Genome project, and proteins from the Amborella Genome project. These data were generated by examining the proteins for which at least 70% of the protein was included in the local alignment. We then generated for each species, four categories based on the ESS: s90 (100 < ESS < 90), s80(90 < ESS < 80), s70 (80 < ESS < 70) and none (70 < ESS).
Figure 2
Figure 2
Intron lengths were compared for 23 species, which include 21 species curated by the PLAZA project, A. trichopoda, and P. taeda. (A) Comparison of maximum intron lengths for the first four intron positions in the CDS. (B) Comparisons of median intron lengths for the same species for the first four intron positions in the CDS. Species codes are the following: Al (Arabidopsis lyrata), Am (A. trichopoda), At (A. thaliana), Bd (B. distachyon), Cp (Carica papaya), Fv (Fragaria vesca), Gm (G. max), Md (Malus domestica), Me (Manihot esculenta), Mt (Medicago truncatula), Oi (O. sativa ssp. indica), Oj (O. sativa ssp. japonica), Pi (P. taeda), Pp (P. patens), Pt (P. trichocarpa), Rc (R. communis), Sb (S. bicolor), Sm (S. moellendorffii), Tc (T. cacao), Vv (V. vinifera), and Zm (Z. mays).
Figure 3
Figure 3
Results of the TRIBE-MCL analysis that distinguishes orthologous protein groups. The Venn diagram depicts a comparison of protein family counts of five plant classifications: gymnosperms (P. abies, P. sitchensis, and P. taeda), monocots (O. sativa and Z. mays), mosses (P. patens and S. moellendorffii), dicots (A. thaliana, G. max, P. trichocarpa, R. communis, T. cacao, and V. vinifera), and a basal angiosperm (A. trichopoda).
Figure 4
Figure 4
Gene Ontology distribution normalized for molecular function. The orthologous groups defined are exclusive to the angiosperms and conifers, respectively. The angiosperm set includes A. trichopoda, A. thaliana, G. max, P. trichocarpa, P. patens, S. moellendorffii, R. communis, O. sativa, T. cacao, V. vinifera, and Z. mays. The conifer set includes P. abies, P. sitchensis, and P. taeda.
Figure 5
Figure 5
Parsimonious tree predicted by DOLLOP with protein families derived from the MCL analysis of size ≥5. The gains and losses of 13 species (A. thaliana, A. trichopoda, G. max, O. sativa, P. patens, P. trichocarpa, P. abies, P. taeda, R. communis, S. moellendorffii, T. cacao, V. vinifera, and Z. mays) are indicated on tree nodes and branches.
Figure 6
Figure 6
(A) Microsatellite density for three conifer genomes (green), one clubmoss genome (purple), and five angiosperm genomes (orange), (loci per megabase). (B) Microsatellite density (loci per megabase) of the coding sequence of the loblolly pine genome compared to the v1.0 genome and two other loblolly genomic data sets (BACs and fosmids).
Figure 7
Figure 7
(A) Repeat family coverage. Repeat families on the x-axis are ordered by coverage in descending order. Solid lines illustrate cumulative coverage as more families are considered. Dashed lines represent the total repetitive content for that data set. (B) Comparison of bin 1 repetitive content for both partial and full-length annotations. “Full-length + Partial” refers to all full-length and partial hits, and “Percentage of dataset” is a function of the total length annotated by each classification.

Similar articles

Cited by

References

    1. Ahuja M. R., Neale D. B., 2005. Evolution of genome size in conifers. Silvae Genet. 54: 126–137.
    1. Allona I., Quinn M., Shoop E., Swope K., St. Cyr S., et al. , 1998. Analysis of xylem formation in pine by cDNA sequencing. Proc. Natl. Acad. Sci. USA 95: 9693–9698. - PMC - PubMed
    1. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J., 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403–410. - PubMed
    1. Aronen T., Ryynanen L., 2012. Variation in telomeric repeats of Scots pine (Pinus sylvestris L.). Tree Genet. Genomes 8: 267–275.
    1. Bartos J., Paux E., Kofler R., Havrankova M., Kopecky D., et al. , 2008. A first survey of the rye (Secale cereale) genome composition through BAC end sequencing of the short arm of chromosome 1R. BMC Plant Biol. 8: 95. - PMC - PubMed

Publication types