Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct;5(10):1367-1381.
doi: 10.1038/s41559-021-01525-w. Epub 2021 Aug 19.

Gradual evolution of allopolyploidy in Arabidopsis suecica

Affiliations

Gradual evolution of allopolyploidy in Arabidopsis suecica

Robin Burns et al. Nat Ecol Evol. 2021 Oct.

Abstract

Most diploid organisms have polyploid ancestors. The evolutionary process of polyploidization is poorly understood but has frequently been conjectured to involve some form of 'genome shock', such as genome reorganization and subgenome expression dominance. Here we study polyploidization in Arabidopsis suecica, a post-glacial allopolyploid species formed via hybridization of Arabidopsis thaliana and Arabidopsis arenosa. We generated a chromosome-level genome assembly of A. suecica and complemented it with polymorphism and transcriptome data from all species. Despite a divergence around 6 million years ago (Ma) between the ancestral species and differences in their genome composition, we see no evidence of a genome shock: the A. suecica genome is colinear with the ancestral genomes; there is no subgenome dominance in expression; and transposon dynamics appear stable. However, we find changes suggesting gradual adaptation to polyploidy. In particular, the A. thaliana subgenome shows upregulation of meiosis-related genes, possibly to prevent aneuploidy and undesirable homeologous exchanges that are observed in synthetic A. suecica, and the A. arenosa subgenome shows upregulation of cyto-nuclear processes, possibly in response to the new cytoplasmic environment of A. suecica, with plastids maternally inherited from A. thaliana. These changes are not seen in synthetic hybrids, and thus are likely to represent subsequent evolution.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. The genome of A. suecica is largely colinear with the ancestral genomes.
a, Schematic depicting the origin of A. suecica and its current distribution in relation to the ice cover at the Last Glacial Maximum. ka, thousand years ago. Ice cover data are from Natural Resource Canada (https://open.canada.ca/data/en/dataset/a384bada-a787-5b49-9799-f5d589e97bd3). b, Chromosome-level assembly of the A. suecica genome with inner links depicting syntenic blocks between the A. thaliana and A. arenosa subgenomes of A. suecica. Histograms show the distribution of TEs (in blue) and protein-coding genes (in green) along the chromosomes. c, Synteny of the A. thaliana subgenome of A. suecica to the A. thaliana TAIR10 reference. In total 13 colinear synteny blocks were found. d, Synteny of the A. arenosa subgenome to A. lyrata. In total 40 synteny blocks were found, 33 of which were colinear. Of the remaining seven blocks, five represent inversions in the A. arenosa subgenome of A. suecica relative to A. lyrata, one is a translocation and one corresponds to a previously reported misassembly in the A. lyrata genome. Orange bars show the density of missing regions (‘N’ bases) in the A. lyrata genome.
Fig. 2
Fig. 2. Expression and copy number variation of 45S rDNA in A. suecica.
a, The relationship between expression levels (log2(CPM)) and copy number of 45S rDNA shows extensive variation of 45S rDNA copy number and varying direction of ‘nucleolar dominance’. Grey lines connect subgenomes of the same accession. Values above the dashed line are taken as evidence for expression of a particular 45S rDNA allele, as this is above the maximum level of mis-mapping seen in the ancestral species (see Extended Data Fig. 3). b,c, FISH results of a natural A. suecica accession AS90a that has largely lost the rDNA cluster of the A.thaliana subgenome (8 copies calculated for the A. thaliana 45S rDNA and 159 copies of the A. arenosa 45S rDNA). d,e, FISH results of a natural accession ASS3 that has maintained both ancestral rDNA loci (174 copies calculated for the A. thaliana 45S rDNA and 104 copies of the A. arenosa 45S rDNA). Scale bars, 10 μm (b, d).
Fig. 3
Fig. 3. TE dynamics in A. suecica reveal no evidence for abnormal transposon activity.
a, Median TE insertions per genome. As the A. arenosa population is an autotetraploid outcrosser, four randomly chosen haploid A. arenosa subgenomes of A. suecica were combined to make a 4n A. suecica. A. suecica does not show an increase in private TE insertions compared with the ancestral species for either subgenome, and shared TEs constitute a higher fraction of TEs in A. suecica, reflecting the strong population bottleneck at its origin. b,c, Site-frequency spectra of non-synonymous SNPs, synonymous SNPs and TEs in the A. thaliana (b) and A. arenosa (c) subgenomes of A. suecica suggest that TEs are under purifying selection on both subgenomes. d, Three-dimensional histogram of a joint TE frequency spectrum for A. thaliana on the x axis and the A. thaliana subgenome of A. suecica on the y axis. e, Three-dimensional histogram of a joint TE frequency spectrum for A. arenosa on the x axis and the A. arenosa subgenome of A. suecica on the y axis. d and e show stable dynamics of private TEs in A. suecica and a bottleneck effect on the ancestral TEs (shared) at the origin of the A. suecica species.
Fig. 4
Fig. 4. Patterns of gene expression between the subgenomes of A. suecica in rosettes and floral buds.
a, Violin plots of the mean log fold change (logFC) between the subgenomes for the 15 natural A. suecica accessions and 2 synthetic lines for whole rosettes. The centre line is the 50th percentile or median. The box limits represent the interquartile range. The whiskers represent the largest and smallest value within 1.5 times the interquartile range above and below the 75th and 25th percentile, respectively. Mean log fold change for the two accessions (ASS3 and AS530) for which transcriptome data for both whole rosettes and flower buds were available. All the distributions are centred around zero, suggesting even subgenome expression. b, Violin plots for the mean log fold change between the subgenomes for gene pairs with tissue-specific expression in at least one member of the pair.
Fig. 5
Fig. 5. Differential gene expression analysis in A. suecica.
Patterns of differential gene expression in A. suecica support adaptation to the whole-genome duplication for the A. thaliana subgenome and adaptation to the new plastid environment for the A. arenosa subgenome. a, PCA for A. thaliana and the A. thaliana subgenome of natural and synthetic A. suecica lines. Principal component 1 (PC1) separates natural A. suecica from the ancestral species and the synthetic lines. b, PCA for A. arenosa and the A. arenosa subgenome of natural and synthetic A. suecica lines. PC1 separates natural A. suecica from the ancestral species and the synthetic lines, whereas PC2 identifies outlier accessions discussed further below (see Fig. 6). c,d, Heat map of DEGs for the A. thaliana (c) and the A. arenosa (d) subgenome of A. suecica. Positive numbers (red colour) indicate higher expression. Genes and individuals have been clustered on the basis of similarity in expression, resulting in the clusters that are discussed in the text. e, GO enrichment for each cluster in c and d. Categories discussed in the text are highlighted. RNA Pol II, RNA polymerase II.
Fig. 6
Fig. 6. Homeologous exchange contributes to expression variance within A. suecica.
a, Cluster 2 of Fig. 5d explains the outlier accession AS530, which is not expressing a cluster of genes on the A. arenosa subgenome. b, Homeologous genes of this cluster on the A. thaliana subgenome of A. suecica show the opposite pattern and are more highly expressed in AS530 compared to the rest of the population. c, Of the 122 genes from cluster 3, 97 are close to each other on the reference genome but appear to be deleted in AS530 on the basis of sequencing coverage. d, The A. thaliana subgenome homeologues have twice the DNA coverage, suggesting that they are duplicated. e,f, Hi-C data show (spurious) interchromosomal contacts at 25 kb resolution between chromosome 1 and chromosome 6 around the break point of the cluster of 97 genes in AS530 (f) but not in the reference accession ASS3 (e).
Extended Data Fig. 1
Extended Data Fig. 1. Measuring genome sizes of Arabidopsis species using flow cytometry.
a, FACS sorting of Solanum lycopersicum cells from 3-week-old leaf tissue for two replicates. G1 represents the peak denoting the G1 phase of the cell cycle. Cells in the G1 phase have 2 C DNA content (that is a 2 N genome). b, A. thaliana ‘CVI’ accession c, A. lyrata ‘MN47’ (the reference accession) d, A. suecica ‘ASS3’ (the reference accession) e autopolyploid A. arenosa accession ‘Aa4’ f, Bar chart shows calculated genome sizes (rounded to the nearest whole number) for each species using Solanum lycopersicum as the standard.
Extended Data Fig. 2
Extended Data Fig. 2. Hi-C and a genetic map analysis for the A. suecica genome.
a, Hi-C contact map for the genome of A. suecica. b, Mixing of A. thaliana and A. arenosa Hi-C reads suggest interchromosomal contacts between homeologous chromosomes is a result of mis-mapping for Hi-C reads. c, Accession ‘AS530’ with the region of HE highlighted with an arrow (Fig. 6), no other rearrangements were observed. d, Hi-C of synthetic A. suecica (third selfed generation). e and f physical distance (Mb) vs genetic distance (cM) is plotted for the A. thaliana and A. arenosa subgenome, respectively. Chromosome 2 is not plotted as there are too few SNPs on this chromosome in our cross, due to the recent bottleneck in A. suecica.
Extended Data Fig. 3
Extended Data Fig. 3. Genome composition and analysis of orthologues and the rDNA.
a, Genome composition of the A. suecica subgenomes and the ancestral genomes of A. thaliana and A. lyrata (here a substitute reference for A. arenosa because it is annotated). b, Counts of orthologous genes between the subgenomes of the reference A. suecica genome and the reference A. thaliana and A. lyrata genome. c, Copy number of A. thaliana and A. arenosa rDNA in natural A. suecica, ancestral species and synthetic lines. Blue triangles represent the A. thaliana and A. arenosa parent lines of the synthetic A. suecica cross. AT represents results when mapping to the A. thaliana consensus sequence and AA to the A. arenosa consensus sequences for the 45S rRNA b, Expression (log2(CPM)) of A. thaliana and A. arenosa rDNA in natural A. suecica, ancestral species and synthetic lines. Accessions with log2(CPM) of > =15 was taken as evidence for expression for the A. thaliana and A. arenosa 45S rRNA in A. suecica, as this CPM value was above the maximum level of mis-mapping observed in the ancestral species (A. thaliana mapping to the A. arenosa 45S rRNA).
Extended Data Fig. 4
Extended Data Fig. 4. Population frequency and genomic location of transposon polymorphisms.
Shared TE SFS for the a, A. thaliana and b, A. arenosa subgenome. Private TE SFS for the c, A. thaliana and d, A. arenosa subgenome. e, TEs ancestrally from A. arenosa that are present in the A. thaliana subgenome of A. suecica and TEs ancestrally from A. thaliana that are present in the A. arenosa subgenome of A. suecica. f, Shared TEs in the population between A. thaliana and the A. thaliana subgenome of A. suecica. Shared TEs are likely older than private TEs and are enriched around the pericentromeric regions in the A. thaliana subgenome. Private TEs are enriched in the chromosomal arms for both species, where protein-coding gene density is higher (Fig. 1b). g as in f but examining TEs in the population of A. arenosa and the A. arenosa part of A. suecica. Note the region between 5 and 10 on chromosome 2 was not included in the analysis as this region shows synteny with an unplaced contig.
Extended Data Fig. 5
Extended Data Fig. 5. Transposable element expression analysis.
Patterns of TE expression in natural and synthetic A. suecica show that allopolyploidy is not accompanied by an overall upregulation in TE expression as predicted by the ‘genome shock’ hypothesis. a, Heat map of TE expression for the A. thaliana subgenome of A. suecica (dark green) synthetic A. suecica (cyan) and A. thaliana (light green). b, Heat map of TE expression for the A. arenosa subgenome of A. suecica (dark purple) synthetic A. suecica (pink) and A. arenosa (light purple). c and d the breakdown of TE families expressed in each cluster, with helitrons being the most abundant class on the A. thaliana subgenome and TEs of an unknown family being the most abundant in the A. arenosa subgenome.
Extended Data Fig. 6
Extended Data Fig. 6. Cross-mapping of RNA-seq short reads.
a, Box plots of cross-mapping RNA short reads. This was examined by mixing reads in-silico between A. thaliana and A. arenosa. On average ~6% of A. arenosa reads map to A. thaliana subgenome instead of the A. arenosa subgenome, and ~1% vice versa. Mapping these reads to the combined reference genomes of A. thaliana and A. lyrata (box plot 4 in a) shows that reads map more precisely to the A. suecica reference and that cross-mapping is not due to unreported HE. b, LogFC of log2(CPM) read counts for A. arenosa (CPM of A. arenosa subgenome genes when reads are mapped only to A. arenosa subgenome of A. suecica/CPM of A. arenosa subgenome genes when reads are mapped to the full genome) show only a small effect of mapping strategy to estimate gene expression on the A. arenosa subgenome. c, Pairwise percentage differences (π) for each group measured for the exons of the 14,041 genes in the expression analysis. High levels of π in A. arenosa overlaps with the distribution of π between A. thaliana and A. arenosa. This explains why there is more cross-mapping for A. arenosa than for A. thaliana in a Importantly, lower π within A. suecica for both subgenomes means that measurements for subgenome dominance are not biased by cross-mapping, as we expect less cross-mapping since the distribution of π overlaps less with π between A. thaliana and A. arenosa.
Extended Data Fig. 7
Extended Data Fig. 7. Expression differences between subgenomes in natural and synthetic A. suecica.
The distribution of expression differences across homeologous gene pairs in natural and synthetic A. suecica. b, A heat map of expression for genes in the top 5% biased toward the A. arenosa subgenome. The gene must be in the 5% quantile for at least 1 accession. c, The same as in b but for the A. thaliana subgenome. Correlations of log fold change for genes in the tails of the distribution (top 5% quantile) for the A. arenosa subgenome d and the A. thaliana subgenome e.
Extended Data Fig. 8
Extended Data Fig. 8. Comparison of genetic and expression distance.
a, PCA plot of biallelic SNPs in the population of A. thaliana and A. suecica for the A. thaliana subgenome of A. suecica (N=345,075 biallelic SNPs), of the analysed 13,647 genes in gene expression in addition to 500 bp up and downstream of each gene sequence b, Correlation of 𝜋 (pairwise genetic differences) and expression distance (that is, euclidean distance) for 14,041 genes (*=Bootstrapped 1000 times). c, PCA plot of biallelic SNPs in the population of A. arenosa (N.B. we had DNA sequencing for only 3 of the 4 accessions used in the expression analysis) and A. suecica for the A. arenosa subgenome of A. suecica (N= 1,761,708 biallelic SNPs), of the analysed 14,041 genes in gene expression in addition to 500 bp up and downstream of each gene sequence d, Correlation of Pi (pairwise genetic differences for mapped genomic regions) and expression distance (that is, euclidean distance) for 14,041 genes (*=Bootstrapped 1000 times). A. arenosa was too few samples to give reliable correlations and therefore is NA. Grey bars represent the 95 confidence intervals.
Extended Data Fig. 9
Extended Data Fig. 9. Aneuploidy is frequently observed in synthetic A. suecica.
a, Comparison of FISH analyses of the reference natural A. suecica ‘ASS3’ and synthetic A. suecica. Synthetic A. suecica shows aneuploidy in both subgenomes in the F2 generation (gain of one chromosome on the A. thaliana subgenome (N=11) and loss of one chromosome on the A. arenosa subgenome (N=15)). Natural A. suecica shows a stable karyotype b, DNA-sequencing coverage in the reference natural A. suecica accession ‘ASS3’ c and d, DNA-sequencing coverage in siblings of F1 synthetic A. suecica show different cases of aneuploidy (indicated with arrow) in synthetic A. suecica, chromosome 4 in c and chromosome 11 in d e overlap of genes involved in cell division from Fig. 5e and genes previously shown to play a role in the adaptation to autopolyploidy in A. arenosa. The little overlap in genes between A. suecica and A. arenosa highlights that successful meiosis in polyploids is likely a complex trait.
Extended Data Fig. 10
Extended Data Fig. 10. Evidence of HE in A. suecica.
Reads mapped to the beginning of the HE event in chromosome 6 (~ 15.9 Mb) in ‘AS530’. Arrows point to the direction of the break. Discordant reads map between the A. arenosa subgenome on chromosome 6 and the read pair maps to the homeologous chromosome 1 on the A. thaliana subgenome (~5 Mb) in b. The end of the HE event in chromosome 6 (~18.4 Mb). Discordant reads map between the A. arenosa subgenome in c and the read pair maps to chromosome 1 (~2.8 Mb) on the A. thaliana subgenome in d. e, Gene counts between the syntenic regions. 431 have a 1:1 relationship, 108 genes are specific to the A. arenosa subgenome and 105 genes are specific to the A. thaliana subgenome. f, Composition of the syntenic regions between the two subgenomes. g, The top 5% quantiles (N=702) for variation in gene expression for the A. thaliana subgenome shows in cluster 7 (N=111) the two outlier accessions (AS150 and ASÖ5) are expressing genes differently to the rest of the population. h, Homeologous genes of this cluster on the A. thaliana subgenome of A. suecica show that these genes are not expressed in these two accessions while i shows they are upregulated in ‘AS150’ and ‘ASÖ5’. j and k 101/111 genes in cluster 7 are located on chromosome 4 in close proximity to each other on the A. thaliana subgenome of the A. suecica reference genome and appear to be deleted in AS5Ö5 and AS150. The A. arenosa subgenome homeologues (located on chromosome 11) have twice the DNA coverage, suggesting they are duplicated, in agreement with expectations of HE event.

References

    1. Van de Peer Y, Mizrachi E, Marchal K. The evolutionary significance of polyploidy. Nat. Rev. Genet. 2017;18:411–424. doi: 10.1038/nrg.2017.26. - DOI - PubMed
    1. Soltis PS, Soltis DE. Ancient WGD events as drivers of key innovations in angiosperms. Curr. Opin. Plant Biol. 2016;30:159–165. doi: 10.1016/j.pbi.2016.03.015. - DOI - PubMed
    1. Dehal P, Boore JL. Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol. 2005;3:e314. doi: 10.1371/journal.pbio.0030314. - DOI - PMC - PubMed
    1. Li Z, et al. Multiple large-scale gene and genome duplications during the evolution of hexapods. Proc. Natl Acad. Sci. USA. 2018;115:4713–4718. doi: 10.1073/pnas.1710791115. - DOI - PMC - PubMed
    1. Chen ZJ, et al. Genomic diversifications of five Gossypium allopolyploid species and their impact on cotton improvement. Nat. Genet. 2020;52:525–533. doi: 10.1038/s41588-020-0614-5. - DOI - PMC - PubMed

Publication types