Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 21:10:e63088.
doi: 10.7554/eLife.63088.

The RNA-binding protein SFPQ preserves long-intron splicing and regulates circRNA biogenesis in mammals

Affiliations

The RNA-binding protein SFPQ preserves long-intron splicing and regulates circRNA biogenesis in mammals

Lotte Victoria Winther Stagsted et al. Elife. .

Abstract

Circular RNAs (circRNAs) represent an abundant and conserved entity of non-coding RNAs; however, the principles of biogenesis are currently not fully understood. Here, we identify two factors, splicing factor proline/glutamine rich (SFPQ) and non-POU domain-containing octamer-binding protein (NONO), to be enriched around circRNA loci. We observe a subclass of circRNAs, coined DALI circRNAs, with distal inverted Alu elements and long flanking introns to be highly deregulated upon SFPQ knockdown. Moreover, SFPQ depletion leads to increased intron retention with concomitant induction of cryptic splicing, premature transcription termination, and polyadenylation, particularly prevalent for long introns. Aberrant splicing in the upstream and downstream regions of circRNA producing exons are critical for shaping the circRNAome, and specifically, we identify missplicing in the immediate upstream region to be a conserved driver of circRNA biogenesis. Collectively, our data show that SFPQ plays an important role in maintaining intron integrity by ensuring accurate splicing of long introns, and disclose novel features governing Alu-independent circRNA production.

Keywords: SFPQ; alternative splicing; chromosomes; circular RNA; gene expression; human; mouse; premature termination.

PubMed Disclaimer

Conflict of interest statement

LS, EO, KE, TH No competing interests declared

Figures

Figure 1.
Figure 1.. Characteristics of DALI-circRNA.
(A) Schematics showing the flanking intron length (red) defined by the sum of annotated flanking introns and inverted Alu element (IAE) distance (blue) defined by the sum of distance to the most proximal IAE. (B–C) Density plot for the distribution of flanking intron lengths (B) and IAE Distance (C) for the top1000 expressed circRNAs in HepG2 (upper facet) and K562 (lower facet). The vertical line represents the median. (D) Contingency table showing the 4-way distribution of circRNAs with long and short flanking introns (in respect to the median) and proximal and distal IAEs (also in respect to the median, see B and C) for HepG2 (left facet) and K562 (right facet). The contingency table is color-coded by circRNA subgroup; DALI (distal Alu, long flanking introns, in red), PASI (proximal Alu, short flanking introns, in blue) and ‘Other’ (unclassified, in gray) circRNAs. The p-values are Fisher's exact test of independence. (E) As in D, but for the subset of circRNAs with conserved expression in mouse.
Figure 1—figure supplement 1.
Figure 1—figure supplement 1.. circRNAome in HepG2 and K562 from ENCODE.
(A) Boxplot showing expression distribution of top1000 expressed circRNA as measured by back-splice junction (BSJ) spanning reads for DALI, PASI and other circRNAs in HepG2 and K562 cells. (B) The fraction of DALI, PASI, and other circRNAs comprising the previously characterized subset of conserved circRNAs, the AUG circRNAs (Stagsted et al., 2019). (C–D) The distribution of genomic lengths, that is the genomic distance between the SD and SA involved in backsplicing (C) and the mature length, that is the predicted length of the fully spliced circRNAs (D) stratified by subgroup as denoted.
Figure 2.
Figure 2.. SFPQ and NONO show enriched binding in the flanking regions of DALI circRNAs.
(A–B) Barplot showing enrichment/depletion of eCLIP signal (see Supplementary file 2) in the vicinity of circRNAs (+/- 2000 nt) compared to host exons (+/- 2000 nt) as determined by Wilcoxon rank-sum tests for HepG2 (A) and K562 (B) eCLIP samples. (C–D) Cumulative plots of SFPQ (C) and NONO (D) eCLIP read distribution upstream and downstream of circRNA subgroups and host exons as denoted. (E) Schematic showing localization of primers (+/- 2000 nt) for targeting either upstream (up) or downstream (down) intronic regions of splice sites in respect to circRNA exons or host exon. (F) Western blotting of immunoprecipitated (IP), endogenous SFPQ or NONO from nuclear fractions of HepG2 cells with Histone H3 as a loading control. (G–H) RT-qPCR of intronic regions flanking a downstream host gene exon (left facet) or flanking the circRNA producing exon(s) (right facet) of CDYL (G) and ZKSCAN1 (H) upon RNA IP of endogenous SFPQ or NONO from nuclear fractions of HepG2 cells. The relative expression of immunoprecipitate (IP)/input is plotted. Data for three biological replicates are shown.
Figure 2—figure supplement 1.
Figure 2—figure supplement 1.. SFPQ and NONO enriched on circRNA flanking introns.
(A–D) For HepG2 (A and B) and K562 (C and D), boxplots showing the distribution of flanking intron length (A and C) or linear spliced reads (B and D) for DALI circRNAs (red), PASI circRNAs (blue), other circRNAs (gray), host exons, that is all other annotated exons from the circRNA-producing loci (orange), and DALI-like circRNAs, that is exon-pairs from annotated genes sampled to resemble DALI circRNAs based on flanking intron lengths and linear spliced reads (purple). (E–H) Boxplots of reads from SFPQ eCLIP rep1 (F), SFPQ eCLIP rep2 (G), NONO eCLIP rep1 (H), and NONO eCLIP rep2 associated with each subgroup in HepG2 cells (F–G) and K562 cells (H–I) stratified by upstream (upper facets) and downstream (lower facets) aligned reads. p-Values are calculated using Wilcoxon rank-sum tests.
Figure 2—figure supplement 2.
Figure 2—figure supplement 2.. RNA immunoprecipitation of SFPQ and NONO confirms enrichment.
(A–B) As in Figure 1G–H, RT-qPCR on denoted intronic regions in ARHGAP5 (A) and NEIL3 (B) transcripts upon RNA IP of endogenous SFPQ or NONO from nuclear fractions of HepG2 cells. (C–D) Western blotting of endogenous immunoprecipitated (IP) SFPQ (C) or NONO (D) from nuclear fractions of HEK293T cells with Histone H3 as a loading control. Asterisks denote bands derived from the IP antibody. (E–H) As in A-B but using HEK293T cells and with RT-qPCR on CDYL (C), ZKSCAN1 (D), EYA (E), and NEIL3 (F). Data for three biological replicates are shown.
Figure 3.
Figure 3.. Knockdown of SFPQ affects DALI circRNAs.
(A) Western blotting of proteins from HEK293T (upper panel) and HepG2 (lower panel) cells transfected with either CTRL siRNAs, siRNAs targeting NONO mRNA, or siRNAs targeting SFPQ mRNA using antibodies against SFPQ, NONO, and β-tubulin (loading control) as denoted. (B–C) Volcano plot showing deregulated circRNAs upon NONO (left facet) and SFPQ (right facet) depletion in HEK293T cells (B) or HepG2 cells (C) color-coded by circRNA subgroup; DALI circRNAs (red), PASI circRNAs (blue) and ‘other’ circRNAs (gray). (D–E) Boxplot showing overall changes in expression (log2Foldchange) of the three circRNA subgroups upon NONO and SFPQ depletion in HEK293T (D) and HepG2 (E) cells. p-Values are calculated using two-sided Wilcoxon rank-sum tests. (F) Genome screen dump of the circCDYL expressing locus with BSJ-spanning reads visualized as junction-track in the IGV browser (G) RT-qPCR quantification of circCDYL and linear CDYL expression upon SFPQ and NONO-depletion in HepG2 cells relative to GAPDH mRNA using two different siRNA designs for each target. Data for four biological replicates are shown. p-Values are calculated using Student’s two-tailed t-test. (H–I) as in F and G, but for the PASI circRNA, circZKSCAN1. (J) Boxplot showing eCLIP enrichment for SFPQ either immediately upstream or downstream (within 2000 nucleotides from the circRNA splice sites) of expressed circRNAs stratified either by circRNA subgroup or by deregulation upon SFPQ depletion in HepG2 cells. p-Values are calculated using two-sided Wilcoxon rank-sum tests.
Figure 3—figure supplement 1.
Figure 3—figure supplement 1.. SFPQ/NONO-depletion in HEK293T and HepG2 cells.
(A) Schematic showing the siRNA-knockdown protocol in HEK293T and HepG2 cells. For each condition (CTRL, NONO-KD, and SFPQ-KD), two different siRNA designs were used to reduce off-targeting effects, and for each siRNA, the experiment was performed in biological duplicates. (B–E) Western blotting (2nd replicate, B and D) and RT-qPCR (C and E) validation of knockdown in HEK293T (B–C) and HepG2 (D–E) cells. Data for four biological replicates are shown comprising in all three cases two replicates with two different siRNA designs represented by triangles and circles. p-Values are calculated using student’s two-tailed t-test. (F–I) PCA analysis of top500 most variable mRNAs (F and H) and circRNAs (G–I) as measured across samples in HEK293T cells (F and G) and HepG2 cell (H and I) subjected to SFPQ and NONO-depletion. The individual samples are color-coded by the knockdown target as denoted. (J–K) Distributions of flanking intron lengths (J) and inverted Alu distances (K) for circRNAs detected in HEK293T (upper facet) and HepG2 (lower facet) cells. The vertical line and the corresponding value represents the median. (L) Contingency table for circRNAs stratified by flanking intron lengths and inverted Alu distances in HEK293T (left facet) and HepG2 (right facet) cells. The table is color-coded by circRNA subgroups; DALI (red), PASI (blue) and the others (gray). (M) Boxplot showing the number of BSJ-spanning reads for the top1000 circRNAs stratified by subgroup as denoted. p-Values are calculated using Wilcoxon rank-sum tests.
Figure 3—figure supplement 2.
Figure 3—figure supplement 2.. Expression profiles for selected circRNAs.
(A–F) Genomic exon-intron structures of selected circRNAs-producing genes with screendumps showing circRNAs backsplicing reads obtained from RNAseq and visualized using IGV genome browser from HepG2 (A–B) and HEK293T (C–F). Primers used for RT-qPCR are depicted schematically as divergent arrows. Below, RT-qPCR validation in independent experiment using BSJ-spanning primers (circRNAs expression) and flanking linear-splicing primers (host-gene expression) relative to GAPDH mRNA. Data for four biological replicates are shown. p-Values are calculated using student’s two-tailed t-test.
Figure 3—figure supplement 3.
Figure 3—figure supplement 3.. CircRNAome analysis of SFPQ knockout mouse brain data (GSE60246).
(A) Two-by-two contingency table of circRNAs stratified by intron length and inverted Alu distance. p-Value calculated by Fisher’s exact test. (B) Boxplot on the distribution of BSJ-spanning reads for each circRNAs subgroup. p-Value calculated using Wilcoxon rank-sum test. (C) PCA analysis of wild-type (CTRL) and SFPQ knockout (SFPQ-KO) samples based on circRNA expression. (D) Volcano plot showing deregulated circRNA expression comparing WT (CTRL) and SFPQ-KO mouse color-coded by circRNA subgroup as denoted. (E) Quantile plot showing 0.25, 0.5 (median), and 0.75 quantiles of the log2foldchange distribution between WT and SFPQ-KO for each circRNA subgroup (n/s; not significant, Wilcoxon rank sum test). (F) Barplot showing the fraction of circRNAs from each subgroup showing significant deregulation upon SFPQ knockout. p-Value calculated using the Fisher’s exact test. (G) Boxplot of SFPQ eCLIP enrichment (as in Figure 3J) in the 2000nt upstream (left) and downstream (right) flanking regions stratified by circRNAs subgroup (DALI, PASI, or other circRNAs) or by circRNAs deregulation (upregulated, unchanged, or downregulated). p-Values are calculated by Wilcoxon rank sum tests.
Figure 4.
Figure 4.. SFPQ ensures long-gene expression and suppresses cryptic splicing.
(A) Volcano plot depicting differential expression of annotated genes upon NONO or SFPQ KD compared to CTRL in HepG2 cells, stratified by median gene length into ‘long’ and ‘short’ genes as denoted. (B) Boxplot showing binned expression of clustered genes. Each gene is sliced into 20 equally sized bins, and the differential expression of each bin is determined and subgrouped into five k-means clusters (kc) (see Materials and methods). (C) Boxplot showing gene lengths distribution (0.25, 0.5 and 0.75 quantiles) stratified by clusters obtained in B. (D) Schematic representation of alternative splicing, where canonical (gray) denoted the most abundant splicing from the splice donor in question. Inclusion (green) and skipping (red) denotes an alternative splicing event shorter or longer than canonical, respectively. (E) Scatter plot showing alternative splicing in NONO and SFPQ depleted samples as a function of canonical intron length and color-coded by type of splicing either inclusion or skipping, see schematics in D. (F) Barplot with the number of unique alternative splicing events showing significant deregulation upon NONO and SFPQ depletion stratified by inclusion (green) and skipping (red), and whether the alternative SA site is annotated (transparent) or not (opaque). (G) Scatter plot showing effects on intron retention (IR) upon SFPQ and NONO depletion as a function of intron length, color-coded by significance (adjusted p-value<0.05) as denoted. (H) Scatterplot showing for each detectable intron the correlation between changes in exon-inclusion/skipping (red/green) and intron retention upon SFPQ depletion. (I) Boxplot showing the IP/Input enrichment of SFPQ eCLIP reads in introns harboring an exon inclusion or an intron retention event color-coded by whether the event is up or down (red or blue, respectively) or not significant (n/s, gray). (J) Schematic showing coordinates and full genic locus of DENND1A (top panel) and exon 8 and 9 with alternative, unannotated exon in-between (green, middle panel). Merged intron-spanning reads (lower panel) from CTRL, NONO-KD, and SFPQ-KD samples (HepG2) are shown and color-coded by splicing type; canonical (gray), inclusion (green), and skipping (red), see D. (K–M) RT-qPCR analysis of alternative splicing event (K), upstream expression (L) and downstream expression (M) relative to GAPDH mRNA using two different siRNA designs for each target. Data for four biological replicates are shown. p-Values are calculated using student’s two-tailed t-test.
Figure 4—figure supplement 1.
Figure 4—figure supplement 1.. Genic expression profile for selected long genes.
(A–D) Read coverage from HepG2 cells with either NONO- or SFPQ-depletion on DENND1A (A), GMDS (B), ATXN1 (C), and BAZ2B (D). The tracks are all depicted in the 5’–3’ direction and are composed of merged and normalized expression from all replicates/siRNA-designs.
Figure 4—figure supplement 2.
Figure 4—figure supplement 2.. SFPQ ensures long-gene expression (HEK293T + MOUSE).
(A) Volcano plot stratified by genes higher or lower than median gene length, where length is the annotated distance from promoter to terminator. (B) Boxplot showing binned expression of clustered genes in SFPQ-depleted sampled relative to CTRL. Each gene is sliced into 20 bins, and the differential expression of each bin is determined and subgrouped into five kmeans clusters (see Materials and methods). (C) Boxplot showing gene lengths distribution stratified by clusters obtained in B. (D) Scatter plot showing alternative splicing in NONO and SFPQ-depleted samples as a function of canonical intron length and color-coded by type of splicing (either inclusion or skipping, see schematic in Figure 4H). (E) Scatter plot showing effects on retention upon SFPQ and NONO depletion as a function of intron length. (F–J) analyses as in A-E on mouse brain SFPQ knockout samples (GSE60246, see Supplementary file 5).
Figure 4—figure supplement 3.
Figure 4—figure supplement 3.. SFPQ co-expression rescue cryptic splicing.
(A) Western blot on HEK293T cells subjected to SFPQ knockdown combined with either empty vector (EV) or wild-type SFPQ (SFPQ WT) overexpression. The blot shows expression of myc-tagged ectopic SFPQ (upper panel), endogenous+ectopic SFPQ (middle panel) and β-tubulin as loading control (lower panel). (B–E) RT-qPCR on SFPQ mRNA (B) and three DENND1A loci (C-E, as in Figure 4K–M). The errorbars represent standard deviation from technical triplicates.
Figure 5.
Figure 5.. SFPQ depletion activates intronic polyA signal and premature termination.
(A) Volcano plot showing deregulated PAS usage as measured by quantseq upon NONO and SFPQ depletion in HEK293T cells. PAS signals are color-coded by their genic origin; intronic (dark blue), exonic (light blue), or ambiguous (gray). (B) Plot showing the cumulative fraction of PASs as a function of relative genic position stratified by genic origin (ambiguous, exonic or intronic, vertical facets) and color-coded by whether the PAS is significantly up (red) or downregulated (blue) upon SFPQ knockdown. (C) Schematic representation of the DENND1A exon 8–9 locus with alternative exon (green) and putative PAS element (purple). Below, merged quantseq coverage from each experiment. (D) RT-qPCR on input and oligo-dT purified RNA from control and SFPQ-depleted HEK293T cells using amplicons specific for GAPDH mRNA (positive control), circZKSCAN1 (negative control), and the alternative SFPQ-activated exon. Values reflect ratios between oligo-dT purified and input quantities. Data for two biological replicates are shown. (E) Venn diagrams showing the number of unique introns with co-occurring upregulation of PAS and upregulated alternative splicing. The number of expressed introns without any evidence of enriched PASs or alternative splicing is denoted below the diagram. P-values are calculated by Fisher’s exact test. (F–G) Schematic showing the outline of the analysis (upper panel): For each circRNA, the locus spanning from the promoter to the circRNA splice donor was interrogated for the presence of quantseq PASs (F) or exon inclusion (G). Barplot (lower panel) showing the fraction of upregulated and downregulated circRNAs upon SFPQ depletion in HEK293T cells with evidence of a concomitant upregulated upstream PAS (F) or an upstream exon inclusion event (G). Numbers indicate the total number of circRNAs in each group. p-Values are calculated by Fisher’s exact test.
Figure 5—figure supplement 1.
Figure 5—figure supplement 1.. Quantseq analysis.
(A) Schematics depicting the quantseq workflow (B) Top: characterization of the fraction of PAS-containing peaks, where PAS is defined as AAUAAA or AUUAAA, as a function of longest oligo-A stretch identified in peak +/- 50 nt flanking region. Bottom: Total number of peaks identified with (green) or without (orange) PAS as a function of longest A-stretch. (C) Venn diagrams (as in Figure 5E) showing overlapping quantseq PASs and cryptic splicing but stratified into the five kmeans clusters. (D) Relative quantseq PAS position within annotated genes (as in Figure 5B) but stratified by kmeans clusters. Numbers denote the number of peaks in each group and the fraction of genes with significant deregulated peaks in parenthesis.
Figure 5—figure supplement 2.
Figure 5—figure supplement 2.. U1 snRNA abundance upon SFPQ knockdown.
(A) PAGE northern blot on U1 levels (upper panel) upon CTRL or SFPQ depletion in HepG2 cells using two different siRNA designs as denoted. 7SK (lower panel) is used as a loading control. (B) Abundance of U1 snRNA measured by RT-qPCR relative to GAPDH. Triangles and circles, as denoted, represent the two different siRNA designs. (C–D) as in A-B, but using HEK293T cells.
Figure 5—figure supplement 3.
Figure 5—figure supplement 3.. circRNAs in kmeans clusters.
(A–C) For each kmean cluster, boxplots showing the log2FoldChange of circRNAs expression upon SFPQ depletion in HepG2 (A), HEK293T (B) cells, and mouse brain (C) stratified by circRNAs subgroup. (D–F) Barplot of numbers and fraction of circRNAs in each kmean cluster in HepG2 (D), HEK293T (E) cells, and mouse brain (F). The fraction is determined by the number of genes hosting circRNAs relative to the total number of genes in each cluster. (G–I) Scatterplot relating the circRNA deregulation (log2FC) with the deregulation of host-gene linear splicing for HepG2 (G), HEK293T (H) cells, and mouse brain (I) colorcoded by circRNAs subgroup. The diagonal line represents the perfect correlation.
Figure 6.
Figure 6.. Multiple features contribute to circRNA regulation by SFPQ.
(A) Schematic representation of features used in analysis. (B) Heatmap showing the feature coefficients from modeling circRNA deregulation (log2FoldChange) upon SFPQ depletion in HepG2 cells. The numbers within the heatmap are the associated p-values. (C) Boxplot showing the centered and scaled feature-values for significant up (red), significant down (blue), and unchanged (gray) circRNAs in HepG2. (D–E) as in B and C using mouse brain data. (F) Schematic depicting the SFPQ-mediated regulation of circRNA expression. Upon SFPQ knockdown, usage of cryptic splice acceptor sites (cSA) is induced, particularly within long introns. For upstream cSA inclusion (left scenario), the adjacent circRNA is upregulated possibly due to reduced competition with backsplicing, whereas for downstream cSA inclusion (right scenario), the circRNA is repressed due to increased competition with backsplicing,.
Figure 6—figure supplement 1.
Figure 6—figure supplement 1.. HepG2 features.
Bottom-left; Correlation matrix showing for each pair of features the correlation between standardized (centered and scaled) values. The points are color-coded by circRNA regulations, each significant up (red), significant down (blue), or unchanged (gray). Top-right; the correlation values (based on Pearson correlation) and corresponding p-values are shown. The tiles are color-coded by the correlation values.
Figure 6—figure supplement 2.
Figure 6—figure supplement 2.. Mouse brain features.
As in Figure 6—figure supplement 1, but with features from mouse brain data.
Figure 6—figure supplement 3.
Figure 6—figure supplement 3.. HEK293T features and GLM model performance.
(A) As in Figure 6, heatmap showing feature coefficients. (B) Boxplot showing the standardized feature values for up, down and unchanged circRNAs. (C) Model prediction compared to observed log2FC on the 20% test-set. (D) Correlation matrix as in Figure 6—figure supplement 1.
Figure 6—figure supplement 4.
Figure 6—figure supplement 4.. GLM model performance.
(A–B) Scatterplot showing the correlation between observed and predicted log2foldchange values on test-set using GLM model in HepG2 (A) and mouse brain (B). The Pearson correlation and corresponding p-value is denoted in the top-left corner. (C) Scatterplot showing the correlation between GLM coefficients obtain in HepG2 and mouse brain regression analyses.

Similar articles

Cited by

References

    1. Ajuh P, Kuster B, Panov K, Zomerdijk JC, Mann M, Lamond AI. Functional analysis of the human CDC5L complex and identification of its components by mass spectrometry. The EMBO Journal. 2000;19:6569–6581. doi: 10.1093/emboj/19.23.6569. - DOI - PMC - PubMed
    1. Aktaş T, Avşar Ilık İ, Maticzka D, Bhardwaj V, Pessoa Rodrigues C, Mittler G, Manke T, Backofen R, Akhtar A. DHX9 suppresses RNA processing defects originating from the alu invasion of the human genome. Nature. 2017;544:115–119. doi: 10.1038/nature21715. - DOI - PubMed
    1. Ashwal-Fluss R, Meyer M, Pamudurti NR, Ivanov A, Bartok O, Hanan M, Evantal N, Memczak S, Rajewsky N, Kadener S. circRNA biogenesis competes with pre-mRNA splicing. Molecular Cell. 2014;56:55–66. doi: 10.1016/j.molcel.2014.08.019. - DOI - PubMed
    1. Barrett SP, Wang PL, Salzman J. Circular RNA biogenesis can proceed through an exon-containing lariat precursor. eLife. 2015;4:e07540. doi: 10.7554/eLife.07540. - DOI - PMC - PubMed
    1. Berg MG, Singh LN, Younis I, Liu Q, Pinto AM, Kaida D, Zhang Z, Cho S, Sherrill-Mix S, Wan L, Dreyfuss G. U1 snRNP determines mRNA length and regulates isoform expression. Cell. 2012;150:53–64. doi: 10.1016/j.cell.2012.05.029. - DOI - PMC - PubMed

Publication types

Substances

Associated data