Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jul 12;113(28):E4052-60.
doi: 10.1073/pnas.1607532113. Epub 2016 Jun 27.

Chromosome-level assembly of Arabidopsis thaliana Ler reveals the extent of translocation and inversion polymorphisms

Affiliations

Chromosome-level assembly of Arabidopsis thaliana Ler reveals the extent of translocation and inversion polymorphisms

Luis Zapata et al. Proc Natl Acad Sci U S A. .

Abstract

Resequencing or reference-based assemblies reveal large parts of the small-scale sequence variation. However, they typically fail to separate such local variation into colinear and rearranged variation, because they usually do not recover the complement of large-scale rearrangements, including transpositions and inversions. Besides the availability of hundreds of genomes of diverse Arabidopsis thaliana accessions, there is so far only one full-length assembled genome: the reference sequence. We have assembled 117 Mb of the A. thaliana Landsberg erecta (Ler) genome into five chromosome-equivalent sequences using a combination of short Illumina reads, long PacBio reads, and linkage information. Whole-genome comparison against the reference sequence revealed 564 transpositions and 47 inversions comprising ∼3.6 Mb, in addition to 4.1 Mb of nonreference sequence, mostly originating from duplications. Although rearranged regions are not different in local divergence from colinear regions, they are drastically depleted for meiotic recombination in heterozygotes. Using a 1.2-Mb inversion as an example, we show that such rearrangement-mediated reduction of meiotic recombination can lead to genetically isolated haplotypes in the worldwide population of A. thaliana Moreover, we found 105 single-copy genes, which were only present in the reference sequence or the Ler assembly, and 334 single-copy orthologs, which showed an additional copy in only one of the genomes. To our knowledge, this work gives first insights into the degree and type of variation, which will be revealed once complete assemblies will replace resequencing or other reference-dependent methods.

Keywords: Arabidopsis; PacBio sequencing; de novo assembly; gene absence/presence polymorphisms; inversions.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Chromosome-level sequence assembly reveals the karyotype and the arrangement of structural features of the A. thaliana Ler genome. Ideogram of the Ler genome shows idealized chromosomes at pachytene and chromosomal marks, which were revealed with cytogenetics including heterochromatic clusters (dark gray), clusters of centromeric repeats (red), and rDNA (blue) (, ; modified from ref. 56). On the right is an illustration of the assembly, including five chromosomal sequences. The colored bars next to the chromosomes indicate the location of sequence similarity to telomeric repeat sequence (green), centromeric repeat (red), and rDNA (blue) as well as major gaps in the sequence (dark gray). The blue star marks a Ler-specific rDNA cluster, which was earlier identified by cytogenetics (4) and was also found in the assembly.
Fig. 2.
Fig. 2.
Higher-order sequence variation. (A) Schematic of local (Upper) and higher-order (Lower) sequence variation as revealed by a whole-genome alignment. Local sequence divergence does not only include small-scale variation like SNPs and small indels, but also structural variation like large indels and HDRs. Higher-order variation includes transpositions and inversions, which do not reside in the orthologous regions in the other genome. Both colinear (allelic) and rearranged (nonallelic) regions can harbor local variation. (B) Amount of aligned and nonaligned regions in a nonredundant whole-genome alignment of Col-0 and Ler. Aligned regions can be separated into colinear (gray) and rearranged regions [inversions and transpositions (transpos.); red]. Nonaligned regions, typically residing in the breaks between allelic and nonallelic regions, are shown for Col-0 and Ler separately, including the amount of putatively duplicated regions. (C) Location of transpositions and inversions. (D) Genomic space involved in different types of local sequence variation, separately shown for allelic and nonallelic regions. (E) Sequence divergence in allelic and nonallelic alignments. (F) Schematic examples for the consequences of meiotic recombination (CO) events in transposed (Upper) and inverted (Lower) regions. Chromosome arm exchange in nonallelic regions can lead to extreme chromosomal rearrangements. (G) Distribution of the location of 362 CO events in respect to their occurrence in allelic (gray), nonaligned (green), and nonallelic (red) regions in contrast to the genomic fractions of these regions; shown are complete genome (Upper) and only chromosome arms (Lower).
Fig. 3.
Fig. 3.
Impact of large-scale inversions on meiotic recombination and haplotype diversity in a worldwide collection of A. thaliana accessions. (A) Male meiotic recombination frequencies across chromosomes 3 and 4 contrasted with the location of the two large-scale inversions (dark gray boxes) and the pericentromeric regions (light gray boxes) [recombination data generated by Giraut et al. (26)]. Recombination frequency was measured between markers with an average distance of 316 kb. Both inversions co-occur with locally reduced recombination frequencies. The interval harboring the inversion on chromosome 3, however, showed residual recombination activity, which does not imply recombination in the inverted region, but might arise from recombination in 111 kb of noninverted sequence in this interval. (B) The names of 409 accessions colored by the inferred chromosome 4 inversion allele (blue, Ler allele; red, Col-0 allele) as assessed on the left and right breakpoints of the inversion. The accessions were ordered after their occurrence in the haplotype clustering shown in C. (C) Haplotype clustering based on 9,198 SNPs located within the chromosome 4 inversion, revealing two distinct clusters, which perfectly matched the two chromosome 4 inversion alleles. (D) Distribution of the accession origins in central Europe, colored by their respective chromosome 4 inversion alleles. (E) Haplotype diversity within the accessions carrying a Col-0–like (red) or Ler-like allele (blue) of the chromosome 4 inversion. (F) Population differentiation (Fst) between these two groups of accessions. Inversion and pericentromere shown with dark and light gray boxes.
Fig. 4.
Fig. 4.
Gene absence/presence polymorphisms between Col-0 and Ler. (A) Amount of single-copy, polymorphic genes in Col-0 and Ler. The genes were separated by the presence (Left) or absence (Right) of an ortholog in the related species A. lyrata. (B) Amount of single-copy genes with one additional copy in Col-0 and Ler. Cases were separated by the presence of one or two orthologs in the genome of A. lyrata as in A. (C) Dot plot (57) of an example of a local duplication event coping multiple genes in the genome of Ler. Identically colored arrows indicate similarity between underlying gene loci. (D) Amount of local or dispersed gene copies separately shown for copy loss or gain events as defined in B. (E) Sequence identity between gene copies separately shown for copy loss or gain events as defined in B.
Fig. 5.
Fig. 5.
SNPs between six Ler genomes from different laboratories. Location and type of SNPs distinguishing six genomes published as the genome of Ler. Genome-wide visualization revealed large blocks of C→T and G→A mutations specific to two Ler lines.

References

    1. Alcázar R, et al. Analysis of a plant complex resistance gene locus underlying immune-related hybrid incompatibility and its occurrence in nature. PLoS Genet. 2014;10(12):e1004848. - PMC - PubMed
    1. Rédei GP. Single loci heterosis. Z Vererbungsl. 1962;93(1):164–170.
    1. Rédei GP. A heuristic glance at the past of Arabidopsis genetics. In: Koncz C, Chua NH, Schell J, editors. Methods in Arabidopsis Research. World Scientific; Singapore: 1992. pp. 1–15.
    1. Fransz P, et al. Cytogenetics for the model system Arabidopsis thaliana. Plant J. 1998;13(6):867–876. - PubMed
    1. Fransz PF, et al. Integrated cytogenetic map of chromosome arm 4S of A. thaliana: Structural organization of heterochromatic knob and centromere region. Cell. 2000;100(3):367–376. - PubMed

Publication types

Associated data