Stress-induced transcriptional readthrough into neighboring genes is linked to intron retention

Shani Hadar¹, Anatoly Meller¹, Naseeb Saida¹, Reut Shalgi¹

Affiliations

PMID: 36505935
PMCID: PMC9732411
DOI: 10.1016/j.isci.2022.105543

Stress-induced transcriptional readthrough into neighboring genes is linked to intron retention

Shani Hadar et al. iScience. 2022.

. 2022 Nov 9;25(12):105543.

doi: 10.1016/j.isci.2022.105543. eCollection 2022 Dec 22.

Authors

Shani Hadar¹, Anatoly Meller¹, Naseeb Saida¹, Reut Shalgi¹

Affiliation

¹ Department of Biochemistry, Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa 31096, Israel.

PMID: 36505935
PMCID: PMC9732411
DOI: 10.1016/j.isci.2022.105543

Abstract

Exposure to certain stresses leads to readthrough transcription. Using polyA-selected RNA-seq in mouse fibroblasts subjected to heat shock, oxidative, or osmotic stress, we found that readthrough transcription can proceed into proximal downstream genes, in a phenomenon previously termed "read-in." We found that read-in genes share distinctive genomic characteristics; they are GC-rich and extremely short , with genomic features conserved in human. Using ribosome profiling, we found that read-in genes show significantly reduced translation. Strikingly, read-in genes demonstrate marked intron retention, mostly in their first introns, which could not be explained solely by their short introns and GC-richness, features often associated with intron retention. Finally, we revealed H3K36me3 enrichment upstream to read-in genes. Moreover, demarcation of exon-intron junctions by H3K36me3 was absent in read-in first introns. Our data portray a relationship between read-in and intron retention, suggesting they may have co-evolved to facilitate reduced translation of read-in genes during stress.

Keywords: Biological sciences; Molecular Genetics; Molecular biology; Molecular interaction.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of Interest.

Figures

**Figure 1**
Stress leads to readthrough into downstream neighboring genes (A–C) Read density plots (gray) for three examples of read-in genes, shown using IGV plots (Integrative Genomic Viewer 2.12.2⁵⁰), in control and osmotic stress (KCl, 2 h) for expression (RNA-seq) and translation (Ribo-seq). Data are strand specific; strand is indicated. On the bottom, gene annotation tracks (in blue) are shown with the gene names. Regions of interest are highlighted with colors: the DoG region is highlighted in green, the readthrough gene in purple, and the read-in gene in orange. RT-PCR primer locations used in (E) are indicated in red for Tnfrsf12a-Thoc6 (A). (D) Read-in gene sets are common between stresses. UpSet plot, visualizing the sizes of set intersections, of the overlap of the four stresses with the largest number of identified read-in genes, shows significant overlaps between them, with 59 read-in genes shared by all four (p = 3.33e-268, exact test of multiset intersection²⁴). See Figure S2E for overlap with additional conditions. (E) RT-PCR gels performed for the predicted osmotic stress readthrough-read-in chimeras Furin-Fes and Tnfrsf12a-Thoc6 and the non-chimeras Nectin2-Tomm40 and Tmem62-Ccndbp1. The four pairs were PCR amplified from cDNA of osmotic stress (2h) or control cells, in replicates, using primers spanning the intergenic region. Primers locations for Tnfrsf12a-Thoc6 are indicated in red in panel (A) and for the other pairs in Figures S3A–S3C (see STAR Methods and Table S5). Positive bands are shown for both predicted readthrough-read-in chimeras in the stressed cells, whereas both negative non-chimeras did not show any amplification, and no bands were observed in the no-RT controls (RNA samples without reverse transcriptase) validating the lack of genomic DNA contamination. Amplicons expected sizes: 1864 and 1344 for Tnfrsf12a-Thoc6 and Furin-Fes, respectively, and 1598 and 986 for the Nectin2-Tomm40 and Tmem62-Ccndbp1, respectively; ladder sizes are indicated on the left.

**Figure 2**
Read-in genes are significantly short, with fewer shorter introns, and GC-rich (A–C) Distribution of feature lengths (log2 kbp) of read-in (red), non-read-in (green), and all expressed genes (blue) presented as cumulative distribution function (CDF) plots. p values were calculated using Wilcoxon rank-sum test, between either read-in or non-read-in distributions versus all expressed genes; ∗∗∗p < 0.001, p indicated when significant (p < 0.05). Shown are the length of the entire gene (A): p (read-in versus expressed) = 3.23e-73, p (non-read-in versus expressed) = 2.33e-8, p (read-in versus non-read-in) = 1.47e-23, number of introns per gene (B), p (read-in versus expressed) = 2.16e-9, p (non-read-in versus expressed) = 4.2e-1, p (read-in versus non-read-in) = 1.37e-6, and introns lengths distributions (C) p (read-in versus expressed) = 1.87e-294, p (non-read-in versus expressed) = 3.37e-73, p (read-in versus non-read-in) = 1.32e-49. (D) GC content was significantly higher in the 1kb upstream regions of read-in genes; p values were calculated using Wilcoxon rank-sum test; ∗∗∗p < 0.001, p (read-in versus expressed) = 2.49e-10, p (non-read-in versus expressed) = 7e-1, p (read-in versus non read-in) = 1.87e-5. (E) PolyA signals were significantly depleted in read-in regions (1 kb upstream regions), compared to the corresponding regions upstream to all expressed genes. p (read-in versus expressed) = 1.05e-9, p (non-read-in versus expressed) = 4.51e-1, p (read-in versus non-read-in) = 8.77e-7. Additional variants of the polyA signal showed similar trends, Figure S5J. p values were calculated using Wilcoxon rank-sum test; ∗∗∗p < 0.001. (F) The frequencies of the canonical polyA signal AAUAAA (left panel) versus non-canonical polyA signals (right panel), within the 3′ UTRs of genes in each group, showed higher frequencies of non-canonical polyA signals in 3′ UTRs of DoG-producing genes, as well as non-read-in and read-in genes (p < 0.05, chi-square test) compared with all expressed genes (see Table S2 for all p values). (D–F) Data are presented as mean +/− SEM of regions of interest of all genes within respective groups.

**Figure 3**
Read-in genes tend to be lowly translated (A) Expression versus translation levels (Log2 TPM values of RNA-seq on the x axis and Ribo-seq on the y axis) of genes during osmotic stress (KCl, 2h). Read-in genes were stratified by their read-in estimation values (cyan, brown, and magenta) corresponding to 33% quantiles of read-in estimation values (see STAR Methods, Table S3). All expressed genes are shown in gray. High read-in estimation read-in genes tend to have lower levels of translation given their expression levels. Similar trends were also found in other conditions (Figure S7). (B) CDF plot of the translation levels (Ribo-seq TPM, in log2) of genes with different read-in estimation groups in osmotic stress (2 h) showed an inverse correlation between the degree of read-in and the level of translation. p values were calculated using Wilcoxon rank-sum test, ∗∗∗p < 0.001, ∗∗p < 0.01, (see Table S2 for exact p values). Similar trends were also found in other conditions (Figure S8). (C) Random sampling analysis of translation efficiency (translation normalized to expression), where randomly sampled groups were matched for levels of expression with their corresponding read-in estimation group (high, medium, and low, color-coded, see STAR Methods) in osmotic stress (2 h). Dashed lines represent the mean translation efficiency value for each read-in group. The analysis demonstrated a significantly lower translation efficiency for high and medium read-in estimation genes (∗∗∗p < 0.001 for both, see Table S2 for exact p values) compared to the expected translation levels given their expression levels (solid colored distributions). Similar trends were also found in other conditions (Figure S9).

**Figure 4**
Read-in genes show marked intron retention (A) Violin plots demonstrate significantly higher degrees of intron retention for all groups compared with all expressed genes, which increase with the extent of read-in estimation. Each gene is represented by the maximal value of intron retention among all its introns in osmotic stress (2 h). Boxes indicate median, 25^th and 75^th percentiles for each of the groups. Wilcoxon rank-sum test p value calculated for each group versus all expressed genes, ∗∗∗p < 0.001, ∗p < 0.05, see Table S2 for exact p values. Similar trends were also observed in other conditions (Figure S11). (B) Scatterplot of translation levels (log2 Ribo-seq TPM, y axis) versus maximal intron retention (log2 IR value, x axis) for read-in genes in osmotic stress (2 h) exhibit a significant negative correlation (R = −0.407, p(R) = 2.4e-08). The color axis shows the read-in estimation values (in log2), further demonstrating that the more the intron is retained the higher the read-in estimation tends to be (R = 0.475, p(R) = 3.55e-11). Similar trends were also observed in other conditions (Figure S14). The relationship between translation levels and intron retention is shown in Figure S13.

**Figure 5**
Read-in genes intron retention levels are much higher than expected given their genomic characteristics (A) A thousand randomly sampled comparison groups, matched for GC content, and intron lengths distributions as in osmotic stress (2 h) read-in genes, were generated from the set of all expressed genes (see STAR Methods). For each randomly sampled comparison group, the mean value of the maximum intron retention (log2 IR) per gene was calculated and plotted as a histogram (gray). Confidence intervals (+/−99.8%, corresponding to 3∗STD) are presented as dashed black lines. The mean value of the maximum intron retention (log2 IR) of read-in genes (red) is significantly higher than the distribution of the mean intron retention values, even when controlling for GC content and intron lengths (∗∗∗p < 0.001); however, those of other groups were either no different than the background (non-read-in genes in green and genes upstream to read-in genes in olive) or even lower than the read-in genes-matched controls (all DoG-producing genes in black). Similar trends were also observed in other conditions (Figure S16). (B) CDF plots of 5’ (left) and 3’ (right) splice site strengths, calculated as MaxEnt scores (see STAR Methods), showed significantly weaker splice site strengths distribution for first introns in read-in genes (red) compared with all expressed genes (blue), whereas non-read-in genes splice site strengths (green) were similar to those of all expressed genes. p values were calculated using Wilcoxon rank-sum test; ∗∗∗p < 0.001, ∗∗p < 0.01. For 5′ splice sites: p (read-in versus all expressed) = 4.78e-3, p (non-read-in versus all expressed) = 0.12, for 3' splice sites: p (read-in versus all expressed) = 7.8e-7, p (non-read-in versus all expressed) = 0.77.

**Figure 6**
H3K36me3 profiles show enrichment upstream to read-in genes and diminished demarcation of exon-intron junctions in read-in genes first introns (A–D) Peak density profiles (mean and STE of H3K36me3 peaks across all genes within a group) of H3K36me3 histone modifications in G1E cell line (using ENCODE ChIP-seq data) are shown within (A) read-in regions, (B) first introns, (C) last introns, and (D) entire gene body, demonstrating significant enrichment of H3K36me3 in read-in regions of read-in genes compared with the respective regiong of all other control groups (see STAR Methods). Each region was normalized to the same length and plotted in the center, with the addition of flanking regions of 1 kb on each side. See Figure S20 for additional cell lines showing the same trends. Bottom panels show FDR-corrected Wilcoxon ranksum p values for the differences along the positions between read-in genes profile and each of the other groups. (E) H3K36me3 peak density profiles of first, last, and all introns of all expressed genes, demonstrating a basin-like shape, with peaks around 5′ and 3′ exon-intron junctions which sharply decrease toward the intron body.

**Figure 7**
Conservation of read-in genes features in human (A) Ten thousand randomly sampled comparison groups, matched for intergenic distances to the distribution of readthrough-read-in genes intergenic distances were selected, and the median of the differences between mouse and human intergenic distances was calculated to generate the background distribution (gray). +/−95% confidence intervals (CI) indicated by dashed lines. The median intergenic distance difference between mouse and human is significantly more conserved, i.e. closer to zero, for read-in genes (red line, p = 0.0045), whereas that of non-read-in (green line) is no different from the background. (B) As in Figure 2C, intron lengths of the human orthologs of the set of read-in genes defined in mouse are significantly shorter than both all expressed genes orthologs (p = 2.52e-40) and non-read-in genes orthologs (p = 3.21e-9). ∗∗∗p < 0.001, Wilxocon rank-sum test. (C) Conservation analysis of intron lengths of read-in genes showed that read-in genes intron lengths (mean intron length per gene, red line) were significantly more conserved (the difference between mouse and human is closer to zero) compared with the background distribution (p < 0.0001, using random sampling test, gray shows the background districution as in A, +/−95% confidence intervals [CIs] are indicated by dashed lines), whereas that of non-read-in genes (green line) is no different from the background.

See this image and copyright information in PMC

References

1. López-Maury L., Marguerat S., Bähler J. Tuning gene expression to changing environments: from rapid responses to evolutionary adaptation. Nat. Rev. Genet. 2008;9:583–593. - PubMed
1. Biamonti G., Caceres J.F. Cellular stress and RNA splicing. Trends Biochem. Sci. 2009;34:146–153. doi: 10.1016/j.tibs.2008.11.004. S0968-0004(09)00006-1 [pii] - DOI - PubMed
1. Nevo Y., Kamhi E., Jacob-Hirsch J., Amariglio N., Rechavi G., Sperling J., Sperling R. Genome-wide activation of latent donor splice sites in stress and disease. Nucleic Acids Res. 2012;40:10980–10994. doi: 10.1093/nar/gks834. - DOI - PMC - PubMed
1. Sabath N., Levy-Adam F., Younis A., Rozales K., Meller A., Hadar S., Soueid-Baumgarten S., Shalgi R. Cellular proteostasis decline in human senescence. Proc. Natl. Acad. Sci. USA. 2020;117:31902–31913. doi: 10.1073/pnas.2018138117. - DOI - PMC - PubMed
1. Shalgi R., Hurt J.A., Lindquist S., Burge C.B. Widespread inhibition of posttranscriptional splicing shapes the cellular transcriptome following heat shock. Cell Rep. 2014;7:1362–1370. doi: 10.1016/j.celrep.2014.04.044. - DOI - PubMed

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Stress-induced transcriptional readthrough into neighboring genes is linked to intron retention

Affiliation

Stress-induced transcriptional readthrough into neighboring genes is linked to intron retention

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Miscellaneous