Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jul 24:14:502.
doi: 10.1186/1471-2164-14-502.

Transposon fingerprinting using low coverage whole genome shotgun sequencing in cacao (Theobroma cacao L.) and related species

Affiliations

Transposon fingerprinting using low coverage whole genome shotgun sequencing in cacao (Theobroma cacao L.) and related species

Saemundur Sveinsson et al. BMC Genomics. .

Abstract

Background: Transposable elements (TEs) and other repetitive elements are a large and dynamically evolving part of eukaryotic genomes, especially in plants where they can account for a significant proportion of genome size. Their dynamic nature gives them the potential for use in identifying and characterizing crop germplasm. However, their repetitive nature makes them challenging to study using conventional methods of molecular biology. Next generation sequencing and new computational tools have greatly facilitated the investigation of TE variation within species and among closely related species.

Results: (i) We generated low-coverage Illumina whole genome shotgun sequencing reads for multiple individuals of cacao (Theobroma cacao) and related species. These reads were analysed using both an alignment/mapping approach and a de novo (graph based clustering) approach. (ii) A standard set of ultra-conserved orthologous sequences (UCOS) standardized TE data between samples and provided phylogenetic information on the relatedness of samples. (iii) The mapping approach proved highly effective within the reference species but underestimated TE abundance in interspecific comparisons relative to the de novo methods. (iv) Individual T. cacao accessions have unique patterns of TE abundance indicating that the TE composition of the genome is evolving actively within this species. (v) LTR/Gypsy elements are the most abundant, comprising c.10% of the genome. (vi) Within T. cacao the retroelement families show an order of magnitude greater sequence variability than the DNA transposon families. (vii) Theobroma grandiflorum has a similar TE composition to T. cacao, but the related genus Herrania is rather different, with LTRs making up a lower proportion of the genome, perhaps because of a massive presence (c. 20%) of distinctive low complexity satellite-like repeats in this genome.

Conclusions: (i) Short read alignment/mapping to reference TE contigs provides a simple and effective method of investigating intraspecific differences in TE composition. It is not appropriate for comparing repetitive elements across the species boundaries, for which de novo methods are more appropriate. (ii) Individual T. cacao accessions have unique spectra of TE composition indicating active evolution of TE abundance within this species. TE patterns could potentially be used as a "fingerprint" to identify and characterize cacao accessions.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Phylogeny of Herrania balaensis, Theobroma grandiflorum and nine of the T. cacao varieties. The phylogenetic tree was constructed using partial sequence data of 97 ultra conserved orthologus sequences (UCOS). Theobroma cacao cv. Scavina-6 was excluded from the phylogenetic analysis due to low sequencing coverage. Nodes marked with asterisk have high bootstrap support (>90%).
Figure 2
Figure 2
Relative copy-number of transposable elements using reference based mapping. Relative copy-numbers of the TE super-families in the three species represented with bar plots. Relative copy-number was calculated by dividing the total coverage of each super-family, within a sample, by the sample’s mean UCOS coverage. The much lower recovery of transposable elements in the other species is apparently due to mapping failure as the graph based clustering indicates that TE copy numbers are comparable in all species. Error bars represent standard deviation and correspond to intraspecific variation.
Figure 3
Figure 3
Graph based clustering analysis of repetitive elements in the three species. Graph layouts of the four largest clusters of repetitive elements detected in the graph based clustering analysis. Herrania balaensis is shown on the left, T. grandiflorum in the middle and T. cacao cv. Criollo on the right. Clusters are ordered by size, with largest at the top and fourth largest at the bottom. Below each graph layout is the class of the repetitive element, the genome percentage of each cluster and number of paired reads belonging to it in parentheses. Coloured regions in the some graphs represent conserved domains identified by RepeatExplorer. A total of 11,243,224 reads were used in the graph based clustering.
Figure 4
Figure 4
PCA of the transposable element composition in the Theobroma cacao genotypes. A biplot from a principal component analysis (PCA) using the standardized abundance of each TE super-family as explanatory variables. Percentage of the explained variance is shown in parentheses in the legend of the x- and y-axis.
Figure 5
Figure 5
Nucleotide variability of transposable elements in Theobroma cacao. Box plot showing the nucleotide diversity across the super-families in T. cacao. This shows that DNA transposons have less variation at the superfamily level (see Discussion). Analyses were performed on standardized data sets (Methods) and values are presented transformed to a log10 scale.
Figure 6
Figure 6
Nucleotide diversity of LTR/Copia and LTR/Gypsy elements in Theobroma cacao. (A) Schematic diagram of the structure of the two most common LTR retrotransposons super-families in the T. cacao genome. (B) Partitioning of nucleotide variation is shown as percentage values next to each of the retrotransposon components. The white arrows with black background represents the long terminal repeat (LTR), black line regions in between open reading frames (ORFs) and LTRs and grey boxes represent the following open reading frames: Reverse transcriptase (RT), integrase (IT), capsid protein (GAG), aspartic proteinase (AP) and Rnase H (RH).

Similar articles

Cited by

References

    1. Kumar A, Bennetzen JL. Plant retrotransposons. Annu Rev Genet. 1999;33:479–532. doi: 10.1146/annurev.genet.33.1.479. - DOI - PubMed
    1. Feschotte C, Pritham EJ. DNA transposons and the evolution of eukaryotic genomes. Annu Rev Genet. 2007;41:331–368. doi: 10.1146/annurev.genet.40.110405.090448. - DOI - PMC - PubMed
    1. Kelly LJ, Leitch IJ. Exploring giant plant genomes with next-generation sequencing technology. Chromosome Res. 2011;19:1–15. - PubMed
    1. Sun C, Shepard DB, Chong RA, Arriaza JL, Hall K, Castoe TA, Feschotte C, Pollock DD, Mueller RL. LTR retrotransposons contribute to genomic gigantism in plethodontid salamanders. Genome Biol Evol. 2012;4:168–183. doi: 10.1093/gbe/evr139. - DOI - PMC - PubMed
    1. Martin A, Troadec C, Boualem A, Rajab M, Fernandez R, Morin H, Pitrat M, Dogimont C, Bendahmane A. A transposon-induced epigenetic change leads to sex determination in melon. Nature. 2009;461:1135–1138. doi: 10.1038/nature08498. - DOI - PubMed

Publication types

Substances

LinkOut - more resources