Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 16;12(1):2300.
doi: 10.1038/s41467-021-22549-7.

Conserved long-range base pairings are associated with pre-mRNA processing of human genes

Affiliations

Conserved long-range base pairings are associated with pre-mRNA processing of human genes

Svetlana Kalmykova et al. Nat Commun. .

Abstract

The ability of nucleic acids to form double-stranded structures is essential for all living systems on Earth. Current knowledge on functional RNA structures is focused on locally-occurring base pairs. However, crosslinking and proximity ligation experiments demonstrated that long-range RNA structures are highly abundant. Here, we present the most complete to-date catalog of conserved complementary regions (PCCRs) in human protein-coding genes. PCCRs tend to occur within introns, suppress intervening exons, and obstruct cryptic and inactive splice sites. Double-stranded structure of PCCRs is supported by decreased icSHAPE nucleotide accessibility, high abundance of RNA editing sites, and frequent occurrence of forked eCLIP peaks. Introns with PCCRs show a distinct splicing pattern in response to RNAPII slowdown suggesting that splicing is widely affected by co-transcriptional RNA folding. The enrichment of 3'-ends within PCCRs raises the intriguing hypothesis that coupling between RNA folding and splicing could mediate co-transcriptional suppression of premature pre-mRNA cleavage and polyadenylation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Pairs of conserved complementary regions (PCCRs).
A PCCRs are identified in conserved intronic regions (CIRs) that are <10,000 nts apart from each other. B PrePH computes the dynamic programming matrix based on the precomputed helix energies for all k-mers (inset) and energies of short internal loops and bulges (see Supplementary Methods for details). C The distribution of PCCR energies consists of four energy groups: group I (−20 < ΔG ≤ −15 kcal/mol), group II (−25 < ΔG ≤ −20 kcal/mol), group III (−30 < ΔG ≤ −25 kcal/mol), and group IV (ΔG ≤ −30 kcal/mol). D The distribution of p, relative position of a PCCR in the gene. E Multiple independent compensatory substitutions support long-range RNA structure in the phosphatidylinositol glycan anchor biosynthesis class L (PIGL) gene. F PCCRs with significant nucleotide covariations (E value < 0.05, n = 3204) are on average less spread and more stable than PCCRs with insignificant nucleotide covariations (E value ≥ 0.05, n = 905942); two-sided Mann–Whitney test; *** denotes a statistically discernible difference at the 0.1% significance level.
Fig. 2
Fig. 2. Validation and false discovery rate (FDR).
A The difference between icSHAPE reactivity of nucleotides within CCR and the average reactivity of nearby nucleotides in energy groups I–IV (color code as in Fig. 1C). The linear model Δreactivity = β0 + β1ΔG group is represented by the slanted line; β1^=0.03±0.01. B Venn diagram for the number of common nucleotides (left), number of common base pairs (middle), and the number of common base pairs among common nucleotides (right) for the predictions of PrePH and IRBIS. C Estimation of the false-positive rate (FDR) by re-wiring, that is, creating a control set that consists of chimeric non-cognate sequences sampled from different genes. D FDR as a function of energy cutoff ΔG (top left), maximum distance between CIR (top right), E value (bottom left), and GC content (bottom right). Solid lines represent the fitted average over n = 16 randomizations; shaded areas represent 95% confidence intervals obtained by the locally estimated scatterplot smoothing (LOESS) regression.
Fig. 3
Fig. 3. Splicing.
A Control procedures. In the random shift control, a PCCR is shifted within the gene. In the random gene control, a pseudo-PCCR is created in the same relative position of a different gene chosen at random. The number of PCCRs inside, outside, and crossing the reference set of intervals (e.g., introns) are counted. B PCCRs are enriched inside introns and depleted in outside and crossing configurations. C PCCRs looping out exons are depleted. D The cumulative distribution of the average exon inclusion rate (Ψ) in HepG2 cell line for exons looped out by PCCRs of the four energy groups vs. exons not looped out by PCCRs (Ctrl). KS denotes the two-sample Kolmogorov–Smirnov test. Sample sizes for energy groups I–IV and control are n = 73,366, 13,974, 1877, 374, and 161,947, respectively. E The distribution of distances from intronic PCCRs to intron ends (bin size 75 nts). Group I PCCRs are enriched, while group IV PCCRs are depleted in 75-nt windows immediately adjacent to splice sites. In all panels, boxplots (represented by the median, upper and lower quartiles, upper and lower fences; outliers are not shown) correspond to n = 40 randomizations; * and *** denote a statistically discernible difference at the 5% and 0.1% significance level, respectively (in panels B and C for a two-tailed Wilcoxon’s test with H0: enrichment = 1).
Fig. 4
Fig. 4. Splicing, RNA editing, and end processing.
A CCRs are depleted around actively expressed splice sites and enriched around inactive and cryptic splice sites. B PCCRs are enriched outside back-spliced introns (circular RNAs from TCSD, ref. ) and depleted in inside and crossing configurations. C CCRs are enriched with A-to-I RNA-editing sites (RADAR REDIportal,); OR denotes the odds ratio (see “Methods”); error bars represent the 95% confidence intervals. D CCRs are enriched with 5′ and 3′ ends of transcripts annotated in GENCODE database (including all aberrant and incomplete transcripts). That is, transcript ends frequently occur in double-stranded parts of PCCRs. E PCCRs are also strongly enriched with 5′ and 3′ ends of transcripts, that is, the annotated transcript ends frequently occur in the loop between double-stranded parts of PCCRs. In all panels, boxplots (represented by the median, upper and lower quartiles, upper and lower fences; outliers are not shown) correspond to n = 40 randomizations; *, **, and *** denote a statistically discernible difference at the 5%, 1%, and 0.1% significance level, respectively, for a two-tailed Wilcoxon’s test with H0: enrichment = 1.
Fig. 5
Fig. 5. RNA-binding proteins (RBP).
A According to eCLIP profiles, CCRs are enriched within binding sites of some RBPs (top 20 RBPs are shown). The RBPs that show depletion of CCRs are listed in Fig. S12. Boxplots represent n = 40 random shifts of CCR within CIR. B The odds ratios (OR) of RBP binding near both CCR in PCCR given that RBP binds near at least one CCR indicate that PCCRs are enriched with forked eCLIP peaks. Error bars represent the 95% confidence intervals. C The change of inclusion rate (ΔΨ) of exons following short introns (n = 2844, 2650, and 4032 for 1 μg/mL, 2 μg/mL, and R749H mutant, respectively) vs. exons following long introns (n = 2931, 2762, and 3807 for 1 μg/mL, 2 μg/mL, and R749H mutant, respectively) in response to RNA Pol II slowdown with α-amanitin and in the slow RNA Pol II mutant R749H. D The difference between the inclusion rate change of exons following introns with a PCCR (ΔΨPCCR) and the inclusion rate change of exons following introns of the same length, but without PCCRs (ΔΨnoPCCR) in response to RNA Pol II slowdown (n = 191, 184, and 156 for 1 μg/mL, 2 μg/mL, and R749H mutant, respectively). In all panels, boxplots are represented by the median, upper and lower quartiles, upper and lower fences without outliers; *, **, and *** denote a statistically discernible difference at the 5%, 1%, and 0.1% significance level, respectively (two-tailed Mann–Whitney and Wilcoxon’s tests, in panel D with respect to H0: ΔΨPCCR − ΔΨnoPCCR = 0).
Fig. 6
Fig. 6. Case studies.
A An RNA bridge in ENAH gene brings a distant RBFOX2 binding site into proximity of the regulated cassette exon. The exon inclusion rate substantially decreases under RBFOX2 depletion (ΔΨ = −0.43). B The predicted RNA bridge in RALGAPA1 brings distant binding sites of RBFOX2 and QKI to the regulated exon. The exon significantly responds to the depletion of these two factors (ΔΨ = −0.28 and ΔΨ = −0.75, respectively). C A cassette exon in GPR126 is looped out by a PCCR overlapping an eCLIP peak of RBFOX2 and significantly responds to RBFOX2 depletion (ΔΨ = −0.56). D An alternative terminal exon in FGFR1OP2 is looped out by a PCCR overlapping an eCLIP peak of QKI and significantly responds to QKI depletion (ΔΨ = −0.48). In all panels, exon inclusion rate changes are statistically significant (q value < 0.01).
Fig. 7
Fig. 7. RNA folding and splicing could mediate co-transcriptional suppression of premature cleavage and polyadenylation (a hypothesis).
A The cleavage and polyadenylation of a structured pre-mRNA is rescued by the co-transcriptional splicing of the intron, while RNA structure stabilizes the molecule through intramolecular base pairings. B In the absence of RNA structure, such a rescue would not happen when splicing has a delay relative to cleavage and polyadenylation. Switching between (A) and (B) depends on the rates of splicing, folding, and RNA Pol II elongation.

References

    1. Breaker, R. R. Riboswitches and the RNA world. Cold Spring Harb. Perspect. Biol.4, a003566 (2012). - PMC - PubMed
    1. Bowman JC, Hud NV, Williams LD. The ribosome challenge to the RNA world. J. Mol. Evol. 2015;80:143–161. doi: 10.1007/s00239-015-9669-9. - DOI - PubMed
    1. Quinn JJ, Chang HY. Unique features of long non-coding RNA biogenesis and function. Nat. Rev. Genet. 2016;17:47–62. doi: 10.1038/nrg.2015.10. - DOI - PubMed
    1. Marchese FP, Raimondi I, Huarte M. The multidimensional mechanisms of long noncoding RNA function. Genome Biol. 2017;18:206. doi: 10.1186/s13059-017-1348-2. - DOI - PMC - PubMed
    1. Guttman M, Rinn JL. Modular regulatory principles of large non-coding RNAs. Nature. 2012;482:339–346. doi: 10.1038/nature10887. - DOI - PMC - PubMed

Publication types