Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun 5;14(1):3248.
doi: 10.1038/s41467-023-38954-z.

Long-read direct RNA sequencing reveals epigenetic regulation of chimeric gene-transposon transcripts in Arabidopsis thaliana

Affiliations

Long-read direct RNA sequencing reveals epigenetic regulation of chimeric gene-transposon transcripts in Arabidopsis thaliana

Jérémy Berthelier et al. Nat Commun. .

Abstract

Transposable elements (TEs) are accumulated in both intergenic and intragenic regions in plant genomes. Intragenic TEs often act as regulatory elements of associated genes and are also co-transcribed with genes, generating chimeric TE-gene transcripts. Despite the potential impact on mRNA regulation and gene function, the prevalence and transcriptional regulation of TE-gene transcripts are poorly understood. By long-read direct RNA sequencing and a dedicated bioinformatics pipeline, ParasiTE, we investigated the transcription and RNA processing of TE-gene transcripts in Arabidopsis thaliana. We identified a global production of TE-gene transcripts in thousands of A. thaliana gene loci, with TE sequences often being associated with alternative transcription start sites or transcription termination sites. The epigenetic state of intragenic TEs affects RNAPII elongation and usage of alternative poly(A) signals within TE sequences, regulating alternative TE-gene isoform production. Co-transcription and inclusion of TE-derived sequences into gene transcripts impact regulation of RNA stability and environmental responses of some loci. Our study provides insights into TE-gene interactions that contributes to mRNA regulation, transcriptome diversity, and environmental responses in plants.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Detection of TE-gene transcripts (TE-G transcripts) and alternative TE-gene isoforms (ATE-G isoforms) in DRS-AtRTD3 of the Arabidopsis transcriptome.
a Representative TE-G transcripts associated with the RPP4-ATCOPIA4 locus detected in DRS-AtRTD3. b The number of TE-G transcripts and ATE-G isoforms detected by ParasiTE. c RNA processing events associated with ATE-G isoforms (TE-AS and TE-ATP). TE-AFE and TE-ALE are included in TE-ATSS and TE-ATTS, respectively. AS; Alternative Splicing, ATP; Alternative Transcript Production, IR; Intron Retention, ES; Exon Skipping, A5SS; Alternative 5′ Splicing Sites, A3SS; Alternative 3′ Splicing Sites, ATSS; Alternative Transcription Start Sites, ATTS; Alternative Transcription Termination Sites, AFE; Alternative First Exon, ALE; Alternative Last Exon. d The number of genes (left; 39,998 gene models based on DRS-AtRTD3 annotation; 27,628 genes were associated with Arabidopsis Genome Initiative codes) and TEs (right; n = 18,881 TE annotations) associated with TE-G transcripts and ATE-G isoforms identified in DRS-AtRTD3. e Enriched Gene Ontology terms of genes associated with TE-G transcripts (top) and ATE-G isoforms in DRS-AtRTD3 (bottom). f The number of TEs associated with ATE-G isoform events. Some TEs were included in several ATE-G isoform categories. g Representative loci associated with TE-AS and TE-ATP events detected in DRS-AtRTD3. Red, TE annotation; Green, AtRTD3 annotation (collapsed); Blue, DRS-AtRTD3 annotation (extended). Source data are provided as a Source Data file.
Fig. 2
Fig. 2. TE superfamilies associated with ATE-G isoforms.
a Proportion of TE superfamilies in the Arabidopsis thaliana genome, TE-G transcripts, and ATE-G isoforms. b Comparison of the length of all TEs, intergenic TEs, intronic TEs, and TEs associated with TE-G transcripts, and with ATE-G isoforms. The centerline represents the median. The borders of the boxplots are the first and third quartiles (Q1 and Q3). Whiskers represent data range, bounded to 1.5 * (Q3-Q1). c DNA methylation (CG, CHG, and CHH contexts) of all TEs, intergenic TEs, intronic TEs, and TEs associated with TE-G transcripts, and with ATE-G isoforms. P-values were obtained by the Mann-Whitney U test; *, p < 0.05; **, p < 0.01; ***, p < 0.001. The centerline represents the median. The borders of the boxplots are the first and third quartiles (Q1 and Q3). Whiskers represent data range, bounded to 1.5 * (Q3-Q1). d Top: the number of TEs associated with ATE-G isoforms (left, TE-AS; right, TE-ATP). The proportion of the TE superfamilies in the genome is also displayed as a reference. Bottom: Fold-enrichment of TE superfamilies significantly enriched in TE-AS or TE-ATP. P-values were obtained by the hypergeometric test. Only TE superfamilies with fold-enrichment of p < 0.05 are shown. e Enrichment of nucleotides at splicing donor and splicing acceptor sites associated with TE-IR for all TEs, and TE superfamilies TIR/Mutator and TIR/CACTA with fold-enrichment of p < 0.05. Fold-enrichment and p values (hypergeometric test) are indicated. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Epigenetic regulation of ATE-G isoforms.
a Top: Number of genes in each mutant showing isoform switching of ATE-G isoforms detected in DRS-AtRTD3. Bottom: Number of genes in each mutant showing change in isoform usage of ATE-G isoforms detected in mutant-DRS dataset. b Representative genome loci showing epigenetically regulated ATE-G isoform (Epi-ATE-G isoform) production with TE-ATSS events. Tracks (from top to bottom): CAGE-seq (reads per million. Only forward or reverse strands are shown); Col-0 ChIP-seq of H3K9me2 (bin per million); methylation level of each mutant in CG, CHG, and CHH contexts (0–100%); DRS read alignments of Col-0 and indicated mutants; TE and AtRTD3 transcript annotations, de novo assembly of transcripts in mutants, and the orientation of genes and TEs. Red arrows on the top indicate cryptic TSSs detected in epigenetic mutants. c Epi-ATE-G isoform production and isoform switching detected at the AT2G40960 locus in ddm1. Top: ddm1-DRS transcripts aligned to the AT2G40960 locus. Middle: Isoform usage (IF) of ATE-G isoforms. Benjamini–Hochberg false discovery (FDR) corrected p-values (q-values; *, q < 0.05) for isoform switching. Bottom: Expression levels of isoforms. Error-bars indicate 95% confidence intervals. Adjusted p-value from DESeq2 (*, padj < 0.05).
Fig. 4
Fig. 4. Epigenetic regulation of TE-ATTS.
a Representative genome loci showing Epi-ATE-G isoform production with TE-ATTS events. Tracks (from top to bottom): ChIP-seq data for RNA Pol II phosphorylated at Ser5/Ser2 in CTD repeats (bins per million); ChIP-seq data for IBM2 and EDM2 localization (bins per million); Col-0 ChIP-seq of H3K9me2 (reads per million); methylation levels of Col-0 in CG, CHG, and CHH contexts (0–100%); poly(A) sites obtained from the PlantAPA database; DRS read alignments of Col-0 and indicated mutants; TE and transcript annotations of AtRTD3 and DRS-AtRTD3 in this study and the orientation of genes and TEs. b Metaplots for ChIP-seq signals of Pol II (Ser2P), IBM2, EDM2, and H3K9me2 over TEs with ATTS (isoform switching with q < 0.05 detected at least once among mutants; Supplementary Data 5; n = 223) or randomly selected TEs (n = 223). c Changes in Pol II (Ser2P) ChIP-seq signals in TEs with ATTS (n = 223) between the wild-type and mutants. The centerline represents the median. The borders of the boxplots are the first and third quartiles (Q1 and Q3). Whiskers represent data range, bounded to 1.5 * (Q3-Q1). Source data are provided as a Source Data file.
Fig. 5
Fig. 5. Epigenetic regulation of ATE-G isoforms in the GER5 locus and the impact on environmental responses and RNA stability.
a GER5-ATCOPIA78/ONSEN locus. TEs, Araport11 gene annotation, and DRS-AtRTD3 transcript isoforms are shown. * indicates isoforms examined by RT-qPCR in c. b, c Relative expression of transcripts corresponding to GER5 protein coding sequence (CDS; AT5G13200.1 and AT5G13200.2) and ATE-G isoform (MSTRG.3521.2, 4, 6) under mock and ABA stress conditions in indicated genotypes. Bars represent the means of four biological replicates ± standard error of the mean (SEM). *, p < 0.05 by t-test for comparison between Col-0 and mutants under mock conditions, and +, p < 0.05 by t-test under ABA stress conditions. d Relative expression of transcripts corresponding to GER5 CDS (AT5G13200.1) in the A. thaliana ecotypes with or without ATCOPIA78/ONSEN insertion in the 3′-UTR. Bars represent the means of four biological replicates ± SEM. *, p < 0.05 by t-test for comparison between Col-0 and mutants. e Relative transcript levels of GER5 (AT5G13200.2) at 0, 30, 60, 90, and 120 min after cordycepin treatment in Col-0, ibm2, edm2, suvh456, and ecotypes without ATCOPIA78/ONSEN insertion (Ler-0 and Sha). Expression levels at 0 min are set as 1. Bars represent the means of four biological replicates ± SEM. *, p < 0.05 by t-test. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. Epigenetic regulation of ATE-G isoforms in the RPP4 locus and the impact on environmental responses and RNA stability.
a RPP4-ATCOPIA4 locus. TEs, Araport11 gene annotation, DRS-AtRTD3 transcript isoforms, and primers for RT-qPCR are shown. b Relative expression of RPP4 transcripts detected by RT-qPCR with primers indicated in a. Bars represent the means of four biological replicates ± SEM. *, p < 0.05 by t-test. c Relative transcript levels of RPP4 at 0, 30, 60, 90, and 120 min after cordycepin treatment in Col-0, ibm2, edm2, and suvh456. Expression levels at 0 min are set as 1. Bars represent the means of four biological replicates ± SEM. *, p < 0.05 by t-test. d Incompatibility of A. thaliana ecotypes and mutants against Hyaloperonospora arabidopsidis infection. NFA-10 and Kas-2 are ecotypes without the RPP4 locus and were used as controls. Class I (white), hypersensitive response surrounding conidia penetration sites; class II (light green), presence of trailing necrosis in ≤50% leaf area; class III (dark green), presence of trailing necrosis in ≤75% leaf area; class IV (black), compromised ETI immunity, presence of pathogen hyphae not targeted by HR and conidiophores. Statistically significant differences in the frequency distribution of the classes between lines and Col-0 were determined by Pearson’s chi-squared test; *, p < 0.05; **, p < 0.01; ***, p < 0.001. 70–130 leaves were analyzed per line across three separate experimental replicates. Source data are provided as a Source Data file.
Fig. 7
Fig. 7. Regulation of Epi-ATE-G isoform candidates under various stress conditions.
Heatmaps showing statistically significant differential isoform usage (dIF; left) and differential expression fold-change (right) with TE-ATTS under stress conditions or in epigenetic mutants. Transcriptome data with the stress treatment studies are from the public RNA-seq data. The Epi-ATE-G isoform candidate AT1G58848 found by others was also added. Red dotted lines highlight predicted isoform switching at GER5 (AT5G13200) and F-box gene (AT1G11270) loci in Col-0 under heat stress conditions. The names of the transcripts from the consensus DRS transcriptome as well as one corresponding to the reference transcript are indicated (more than one reference transcript from DRS-AtRTD3 or mutant-DRS are applicable to some transcripts). Source data are provided as a Source Data file.

References

    1. Bennetzen JL, Wang H. The contributions of transposable elements to the structure, function, and evolution of plant genomes. Annu. Rev. Plant Biol. 2014;65:505–530. doi: 10.1146/annurev-arplant-050213-035811. - DOI - PubMed
    1. Bourque G, et al. Ten things you should know about transposable elements. Genome Biol. 2018;19:1–12. doi: 10.1186/s13059-018-1577-z. - DOI - PMC - PubMed
    1. Furci, L., Berthelier, J., Juez, O., Miryeganeh, M. & Saze, H. Plant Epigenomics. in Handbook of Epigenetics 263–286 (Elsevier, 2023).
    1. Casacuberta E, González J. The impact of transposable elements in environmental adaptation. Mol. Ecol. 2013;22:1503–1517. doi: 10.1111/mec.12170. - DOI - PubMed
    1. Galindo-González L, Mhiri C, Deyholos MK, Grandbastien M-A. LTR-retrotransposons in plants: Engines of evolution. Gene. 2017;626:14–25. doi: 10.1016/j.gene.2017.04.051. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources