Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jul 14;166(2):492-505.
doi: 10.1016/j.cell.2016.06.044.

Epigenomic Diversity in a Global Collection of Arabidopsis thaliana Accessions

Collaborators, Affiliations

Epigenomic Diversity in a Global Collection of Arabidopsis thaliana Accessions

Taiji Kawakatsu et al. Cell. .

Abstract

The epigenome orchestrates genome accessibility, functionality, and three-dimensional structure. Because epigenetic variation can impact transcription and thus phenotypes, it may contribute to adaptation. Here, we report 1,107 high-quality single-base resolution methylomes and 1,203 transcriptomes from the 1001 Genomes collection of Arabidopsis thaliana. Although the genetic basis of methylation variation is highly complex, geographic origin is a major predictor of genome-wide DNA methylation levels and of altered gene expression caused by epialleles. Comparison to cistrome and epicistrome datasets identifies associations between transcription factor binding sites, methylation, nucleotide variation, and co-expression modules. Physical maps for nine of the most diverse genomes reveal how transposons and other structural variants shape the epigenome, with dramatic effects on immunity genes. The 1001 Epigenomes Project provides a comprehensive resource for understanding how variation in DNA methylation contributes to molecular and non-molecular phenotypes in natural populations of the most studied model plant.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Origins of 1,028 accessions included in the 1001 Epigenomes project methylomes and transcriptomes
(A) Overlap between accessions used in the 1001 genomes, methylomes and transcriptomes projects. All are included in the initial selection of 1,227 accessions. (B) Overlap with published population methylome studies (Dubin et al., 2015; Hagmann et al., 2015; Schmitz et al., 2013). (C) Sample types for the 1,028 accessions. Plants were grown and sequenced at the Salk, GMI or MPI. Since more than one sample type was analyzed for some accessions, there were 1,107 methylomes from 1,028 accessions and 1,203 transcriptomes from 998 accessions. Transcriptomes were sequenced mainly on the Illumina platform, and partly with SOLiD platform (CS). Growth temperatures at in parentheses. a.t: ambient temperature 22°C. (D) Original collection locations of accessions in the 1001 Epigenomes project. Colors correspond to (B). Dotted lines indicate longitude and latitude grids at 30° intervals. See also Figure S1, Table S2 and S3.
Figure 2
Figure 2. DNA methylation patterns within gene bodies are associated with expression
(A) Correlation between the number of gene body methylated (gbM) genes (x-axis) and their average CG methylation levels (y-axis). Each point is one accession, colored by data source in Fig. 1C. Cvi-0 and UKID116 are the most hypomethylated accessions, while Dör-10 is the most hypermethylated. (B) A snapshot of the 1001 Epigenomes Anno-J browser (http://neomorph.salk.edu/1001.php) for an example region on chromosome 1, showing hyper-, average and hypo- gene body methylation in Dör-10, Col-0 and Cvi-0. Top track is gene model and yellow ticks in the bottom three tracks indicate CG methylation levels at each cytosine. (C) Geographical distribution of hyper- and hypo- gbM accessions. (D) Population-wide relation between epiallele and gene expression levels. Expression levels are shown as log2 (FPKM + 1). UM: unmethylated genes. gbM: gene body methylated genes. teM: TE-like methylated genes. (E) Comparison of pairwise correlations for mCG within gene bodies (x-axis) and mRNA abundance across all accessions (y-axis), indicating positions for hypomethylated Cvi-0 vs. hypermethylated Bak-5, Cvi-0 vs. average methylated Col-0 and Col-0 vs. Bak-5. (F) Transcript abundance (left) of hypermethylated (Bak-5), average (Col-0) and hypomethylated (Cvi-0, UKID116) accessions and mCG within gene bodies (right). Genes were sorted by average expression level. (G) AnnoJ browser snapshots for representative poly-epiallelic (PE) genes AT1G10190 and AT2G07680 that show gbM (mainly mCG) or teM (all contexts) in selected accessions. (H) Venn Diagram for the numbers of gbM genes, teM genes and their overlap (PE genes), based on Salk-grown samples. (I) Binning of PE genes based on gbM frequency (the fraction of accessions with gbM epiallele among Salk-grown accessions) and teM frequency. Each tile on the heatmap indicates the number of PE genes in the corresponding bin. (J) Density distribution of teM singletons in relict and non-relict accessions. (K) Enrichment of PE genes for major effect mutations. (L) Enrichment of PE genes for GO terms related to immunity and phosphorylation. (M) Association of epiallele state and gene expression level at MAF3. (N) Heatmap of CHH methylation around PE genes that have a teM epiallele but do not contain TEs within their gene bodies or within 500 bp up-/downstream in Col-0. TSS: transcription start site, TTS: transcription termination site.
Figure 3
Figure 3. Global patterns of methylation variation
(A) Average CHH methylation levels of CMT2 targeted TEs (x-axis) and RdDM targeted TEs (y-axis) in worldwide accessions and mutants. (B) Geographic distribution of Salk-grown accessions with hypermethylated TEs and hypomethylated CMT2/RdDM targeted TEs. (C) Heatmap for kinship-corrected correlations between the genome-wide methylation level for a particular methylation context (in columns) and environmental/geographic variables (in the rows). Rows and columns were ordered by clustering by similarity in correlation. Pre.: Precipitation. Temp.: Temperature. (D) The fraction of variation in genome-wide methylation (all contexts) across accessions that can be explained by genome-wide kinship, i.e., SNP heritability. See also Extended Experimental Procedures.
Figure 4
Figure 4. Genome-wide association study on methylation levels
(A-C) Manhattan plots of GWAS results for genome-wide average methylation phenotypes: (A) CHH methylation of RdDM-targeted TEs; (B) CHH methylation of CMT2-targeted TEs; (C) CG gbM. Highlights indicate peaks containing strong a priori candidates. Horizontal gray solid and dashed lines indicate genome-wide threshold p=0.05 with Bonferroni correction and FDR 20% defined by enrichment analysis, respectively. Only SNP with minor allele frequency (MAF) over 5% are included. (D-F) Enrichment and FDR corresponding to (A-C) (based on enrichment of a priori candidates, see Extended Experimental Procedures). The horizontal dashed lines at 0.2 correspond to FDR 20%. (G-H) Close-up of chromosome 5 peak around AGO9 corresponding to (A-B). Green dots show non-reference SNPs with MAF > 5%, and and gray dots show rare SNPs (MAF 1 - 5%). See also Figure S2 and S3
Figure 5
Figure 5. Differentially expressed genes among accessions and co-expression networks
(A) Histogram of number of expressed genes in the accessions. (B) Differentially expressed genes (DEG) between relic and non-relict groups (“R vs. NR”) were a subset of DEGs between all admixture groups (“All groups”). (C) Heatmap of −log10 enrichment p-values for the ten most enriched GO terms (rows) in top 5% varied genes and DEGs (columns). The row dendrogram was obtained by hierarchical clustering. (D) Overlap of co-expression gene modules between relict and non-relict accessions. P-values from Fisher’s exact test. (E) Shared and divergent functions between relict and non-relict modules. (F-H) Heatmaps of −log10 enrichment p-values for the ten most enriched GO terms in relict modules M4, M5 and non-relict modules M4, M5 (F), relict module M1 and non-relict modules M2, M3 (G), and relict module M2 and non-relict modules M1, M7 (H). Row dendrograms were generated as in (C). (I) Non-relict modules were enriched for binding sites from distinct TF families. See also Figure S4 and Table S4.
Figure 6
Figure 6. Relationship between eQTL, eQTLepi and TFBSs
(A) Distribution of distances from cis-eQTL and cis-eQTLepi to TSS (within 100kb), where epi is CG-, CH-, and C-DMB. (B, C) Overlap of CH-DMB (B) and C-DMB (C) with Col-0 cistrome and epicistrome. (D, E) Enrichment/depletion of TFBS at eQTL and eQTLCH-DMB identified three TF groups. (F) TF methylation sensitivities (x-axis) were correlated with enrichment of binding sites (y-axis) at eQTLCH-DMB (left) but not at eQTL (right). See also Figure S5 and Table S5.
Figure 7
Figure 7. Genome structure is linked to differential methylation and transcription
(A) Summary of genome maps created using images of nick-labeled ultra-long DNA molecules for nine Arabidopsis accessions, including the reference accession Col-0. Columns are (from left): Accession ID, country of origin, total alignment length of optical maps against TAIR10 in Mb and percentage, counts for combined insertions and deletions (indels) per Mb of TAIR10, insertions per Mb, deletions per Mb, genes and TEs within indels, insertions with hyper-, hypo- or mixed DMRs. (B) Boxplot for the length distribution of insertions (red) and deletions (blue) for all eight accessions in kb. (C) Graphical representation of optical contigs aligned to chromosome 5 (green boxed arrows). Black boxes show TAIR10 mis-assemblies. Arrows in magenta represent regions not present in TAIR10 (insertion), and blue represents regions absent in that accession (deletion). (D) Overview of Yeg-8 chromosome 4 optical contig alignments (blue) against TAIR10 (grey). Crossing green and red lines identify two inversions. Red and Yellow lines depict insertions and deletions against TAIR10. The dashed line represents 1.2 Mb of rDNA/nucleolar organizer. Labels show size in Mb. (E) Alignments were used to call insertions (red) and deletions (blue) relative to the TAIR10 reference. A large portion of SVs is shared amongst accessions. (F) RRS1-RPS4 NLR locus on chromosome 5, comparing Erg2-6 and IP-Cum-1 to Col-0. TAIR10 annotations are shown on top as non-NLR genes (grey), NLR genes (black), TEs (orange) and F-box gene (green; see 7G). Both methylated cytosines (mC) and WGS read coverage (read) tracks are shown per accession. Grey bars show mapping-free regions that overlap with predicted SV loci (dashed lines), and size differences are indicated. (G) Transcript expression levels of three genes in accessions where the gene overlap with deletion (Del), as reference (Ref), and insertion (In) loci. Y-axis shows normalized RNA-seq read counts. See also Figure S6, Table S6 and S7.

Comment in

References

    1. Atwell S, Huang YS, Vilhjalmsson BJ, Willems G, Horton M, Li Y, Meng D, Platt A, Tarone AM, Hu TT, et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature. 2010;465:627–631. - PMC - PubMed
    1. Becker C, Hagmann J, Muller J, Koenig D, Stegle O, Borgwardt K, Weigel D. Spontaneous epigenetic variation in the Arabidopsis thaliana methylome. Nature. 2011;480:245–249. - PubMed
    1. Bewick AJ, Ji L, Niederhuth CE, Willing E-M, Hofmeister BT, Shi X, Wang L, Lu Z, Rohr NA, Hartwig B, et al. On the Origin and Evolutionary Consequences of Gene Body DNA Methylation. bioRxiv. 2016 - PMC - PubMed
    1. Brodersen P, Sakvarelidze-Achard L, Bruun-Rasmussen M, Dunoyer P, Yamamoto YY, Sieburth L, Voinnet O. Widespread translational inhibition by plant miRNAs and siRNAs. Science. 2008;320:1185–1190. - PubMed
    1. Cao J, Schneeberger K, Ossowski S, Gunther T, Bender S, Fitz J, Koenig D, Lanz C, Stegle O, Lippert C, et al. Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat Genet. 2011;43:956–963. - PubMed

Publication types