Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jun 23:5:4104.
doi: 10.1038/ncomms5104.

Ancestral repeats have shaped epigenome and genome composition for millions of years in Arabidopsis thaliana

Affiliations
Free PMC article

Ancestral repeats have shaped epigenome and genome composition for millions of years in Arabidopsis thaliana

Florian Maumus et al. Nat Commun. .
Free PMC article

Abstract

Little is known about the evolution of repeated sequences over long periods of time. Using two independent approaches, we show that the majority of the repeats found in the Arabidopsis thaliana genome are ancient and likely to derive from the retention of fragments deposited during ancestral bursts that occurred early in the Brassicaceae evolution. We determine that the majority of young repeats are found in pericentromeric domains, while older copies are frequent in the gene-rich regions. Our results further suggest that the DNA methylation of repeats through small RNA-mediated pathways can last over prolonged periods of time. We also illustrate the way repeated sequences are composted by mutations towards genomic dark matter over time, probably driven by the deamination of methylcytosines, which also have an impact on epigenomic landscapes. Overall, we show that the ancient proliferation of repeat families has long-term consequences on A. thaliana biology and genome composition.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Identification and distribution of divergent repeats in A. thaliana.
(a) Distribution in 1% bins of the identity values between genomic copies and consensus sequences in A. thaliana and A. lyrata. (b) Plot (red dots) and smoothed curve (blue line) of the identities between genomic copies and consensus sequences along A. thaliana chromosome 1 (30.4 Mbp). Grey shading indicates the centromere. (c) Repeat coverage (per cent per 500-kb bins) along the A. thaliana chromosome 1 drawn to scale with b. Grey shading indicates the centromere. (d) Plot (red dots) and smoothed curve (blue line) of the identities between genomic copies and consensus sequences along A. lyrata chromosome 1. Grey shading indicates that centromeres are not assembled for this species.
Figure 2
Figure 2. Ancestral repeats in the A. thaliana genome.
(a) We performed a competitive annotation of the A. thaliana genome with the Brassicaceae library: each copy being detected by the most similar consensus sequence (best score) and resulting in A. thaliana copies being attributed to different Brassicaceae species. For each species, we plot the distribution of identity values between genomic copies and consensus sequences in 1% bins (‘At_other’ represents the pool of Ler-1, Kro-0, Bur-0 and C24 accessions). (b) Distribution along the Col-0 chromosomes of the contributions of the annotations attributed to consensus sequences from different species and ecotypes (‘At_other’ represents the pool of Ler-1, Kro-0, Bur-0 and C24). Grey shading indicates the centromere.
Figure 3
Figure 3. Occurrence of sRNAs in old repeats.
(a) Distribution in 1% bins of the identity values between A. thaliana genomic copies and consensus sequences following whether copies overlap at least one 24-nt sRNA (24-nt sRNA+) or not (24-nt sRNA−). (b) Read density for all sRNA classes addressed in young versus old copies that overlap with at least one read of the respective class. Copy numbers are as follows: N(24-nt sRNA+ young copies)=8,342; N(24-nt sRNA+ old copies)=12,841; N(23-nt sRNA+ young copies)=7,906; N(23-nt sRNA+ old copies)=10,526; N(22-nt sRNA+ young copies)=6,909; N(22-nt sRNA+ old copies)=7,252; N(21-nt sRNA+ young copies)=6,694; N(21-nt sRNA+ old copies)=6,842. Error bars are defined as s.e.m. ***Statistically supported differences (MWU P-value<0.0001). (c) Distribution in 1% bins of the identity values between A. thaliana genomic copies and consensus sequences for each class of sRNA addressed.
Figure 4
Figure 4. Gene expression analysis.
(a) Expression levels of the A. thaliana genes with respect to the presence and location of repeats. ***Statistically supported differences (MWU P-value<0.0001). (b) Expression levels of the A. thaliana genes with respect to the presence, location and age of repeats. ***Any of the two sets of gene with flanking repeat is significantly different (MWU P-value<0.0001) from any of the two sets of genes with repeat within. (c) Expression levels of the A. thaliana genes with flanking repeats with respect to the age of repeats and overlap with 24-nt sRNA. (d) Expression levels of the A. thaliana genes carrying repeats with respect to the age of repeats and overlap with 24-nt sRNA. ***Statistically supported differences (MWU P-value<0.0001). For all panels, error bars are defined as s.e.m.
Figure 5
Figure 5. Composition of aging repeats.
(a) Plot showing the G+C content along the concatenated repeat copies detected on chromosome 1 using the Col-0 library. Gaps indicate values below 30%. (b) Histogram showing average G+C content in the copies distributed in bins of 5% identity with respective consensus sequence. ***Statistically supported differences (MWU P-value<0.0001). Statistical tests were run for A. thaliana only.
Figure 6
Figure 6. G+C content in repeats over time.
(a) Distribution in 1% bins of the identity values between A. thaliana genomic copies and consensus sequences for different classes of repeats. (b) G+C content in copies from different repeat families distributed in bins of 5% identity with respective consensus sequence. Error bars are defined as s.e.m. ***Statistically supported differences (MWU P-value<0.0001) for all repeat families.

Similar articles

Cited by

References

    1. Orgel L. E. & Crick F. H. C. Selfish DNA: the ultimate parasite. Nature 284, 604–607 (1980). - PubMed
    1. SanMiguel P., Gaut B. S., Tikhonov A., Nakajima Y. & Bennetzen J. L. The paleontology of intergene retrotransposons of maize. Nat. Genet. 20, 43–45 (1998). - PubMed
    1. Kidwell M. G. & Lisch D. Transposable elements as sources of variation in animals and plants. Proc. Natl Acad. Sci. USA 94, 7704–7711 (1997). - PMC - PubMed
    1. Capy P., Gasperi G., Biemont C. & Bazin C. Stress and transposable elements: co-evolution or useful parasites? Heredity (Edinb) 85, (Pt 2): 101–106 (2000). - PubMed
    1. Casacuberta E. & Gonzalez J. The impact of transposable elements in environmental adaptation. Mol. Ecol. 22, 1503–1517 (2013). - PubMed

Publication types