Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 27;6(5):e202201793.
doi: 10.26508/lsa.202201793. Print 2023 May.

Assessing the impacts of various factors on circular RNA reliability

Affiliations

Assessing the impacts of various factors on circular RNA reliability

Trees-Juen Chuang et al. Life Sci Alliance. .

Abstract

Circular RNAs (circRNAs) are non-polyadenylated RNAs with a continuous loop structure characterized by a non-colinear back-splice junction (BSJ). Although millions of circRNA candidates have been identified, it remains a major challenge for determining circRNA reliability because of various types of false positives. Here, we systematically assess the impacts of numerous factors related to circRNA identification, conservation, biogenesis, and function on circRNA reliability by comparisons of circRNA expression from mock and the corresponding colinear/polyadenylated RNA-depleted datasets based on three different RNA treatment approaches. Eight important indicators of circRNA reliability are determined. The relative contribution to variability explained analyses reveal that the relative importance of these factors in affecting circRNA reliability in descending order is the conservation level of circRNA, full-length circular sequences, supporting BSJ read count, both BSJ donor and acceptor splice sites at the same colinear transcript isoforms, both BSJ donor and acceptor splice sites at the annotated exon boundaries, BSJs detected by multiple tools, supporting functional features, and both BSJ donor and acceptor splice sites undergoing alternative splicing. This study thus provides a useful guideline and an important resource for selecting high-confidence circRNAs for further investigations.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

Figure 1.
Figure 1.. The assessment of the impacts of various features on circRNA reliability.
(A) Flowchart of the overall analyses. (B) Distribution of the extracted circAtlas circRNA (BSJ) candidates (580,654 candidates) identified by 1, 2, 3, or 4 circRNA detection tools. (C) 580,654 candidates derived from potential alignment ambiguity (with an alternative colinear explanation or multiple hits). (D) Comparisons of the percentages of circRNA candidates derived from potential alignment ambiguity for the candidates detected by 1, 2, 3, or 4 tools. (E) Comparisons of normalized numbers of circRNA candidates with supporting BSJ read count = 1 or ≥ 2 in all extracted mock samples of the 19 mock-treated sample pairs. For each sample, the percentage of circRNA candidates supported by one BSJ read is shown.
Figure S1.
Figure S1.. Examples of BSJ candidates (BSJ-1, BSJ-2, and BSJ-3) with an alternative colinear explanation.
For BSJ-1 (Exon4–Exon3 and FAM76A), the concatenated sequence has an alternative colinear explanation (Exon4–Intron6 and FAM76A). For BSJ-2 (Exon3–Exon2 and ZNF92), the concatenated sequence has an alternative colinear explanation (Exon3 (ZNF273)–Exon3 (ZNF92)). For BSJ-3 (Exon8–Exon2 and ADH1B), the concatenated sequence has an alternative colinear explanation (Exon2 (ADH1B)–Exon2 (ADH1A)).
Figure 2.
Figure 2.. Impact of factors related to circRNA identification on circRNA reliability.
(A) Supporting BSJ read count, (B) number of detected tools, and (C) evidence of full-length circular sequence. For (B, C), the odds ratios, which were determined using two-tailed Fisher’s exact test, represented the ratios of the occurrences of (B) circRNAs detected by multiple tools or (C) circRNAs with the evidence of full-length circular sequence for non-depleted circRNAs to the occurrences of those for depleted circRNAs. The dashed lines represented odds ratio = 1. (A, B) For the bottom panels of (A, B), the correlations between percentages of non-depleted circRNAs and (A) supporting BSJ read count or (B) number of detected tools are shown. For (C), the evidence of full-length circular sequence was supported by CIRI-full (a short read-based approach, top) or circFL-seq (a long read-based approach, bottom). (A, B, C) P-values were determined using Wilcoxon rank-sum test (WRST; greater or less; (A)) or two-tailed FET (B, C) and FDR adjusted across 19 mock-treated sample pairs for each examined factor using Benjamini–Hochberg correction. For (A), the Wilcoxon effect sizes and the corresponding 95% confidence intervals are plotted (see also Table S4). The number of mock-treated sample pairs that passed the statistical significance tests with FDR < 0.05 (or −log10(FDR) > 1.3) are represented in curly brackets.
Figure 3.
Figure 3.
Impact of factors related to circRNA conservation on circRNA reliability. (A, B, C, D) Impact of conservation factors at the (A) species, (B, C) tissue, and (D) individual (or sample) levels on circRNA reliability. For the bottom panels of (A, B, and D), the correlations between percentages of non-depleted circRNAs and (A) number of conserved species, (B) number of conserved tissues, or (C) number of conserved samples are shown. P-values are determined using WRST (greater or less) and FDR adjusted across 19 mock-treated sample pairs for each examined factor using Benjamini–Hochberg correction. The Wilcoxon effect sizes and the corresponding 95% confidence intervals are plotted (see also Table S4). The number of mock-treated sample pairs that passed the statistical significance tests with FDR < 0.05 (or −log10(FDR) > 1.3) are represented in curly brackets.
Figure S2.
Figure S2.. Impact of evolutionary rates on on circRNA reliability.
(A, B) Impact of the evolutionary rates determined by (A) phyloP or (B) phastCons of the sequences around the BSJs on circRNA reliability. For each BSJ, we considered the four regions around the BSJ: within +1 to +10 nucleotides of the acceptor site (acceptor; in exon); within −10 to −1 nucleotides of the acceptor site (acceptor; in intron); within −10 to −1 nucleotides of the donor site (donor; exon); and within +1 to +10 nucleotides of the donor site (donor; intron). The evolutionary rate of each region was measured by the average value of the phyloP (or phastCons) scores of the considered nucleotides (10 bp) within the region (see the Materials and Methods section). P-values were determined using Wilcoxon rank-sum test (WRST; greater or less) and FDR adjusted across 19 mock-treated sample pairs for each examined feature using Benjamini–Hochberg correction. The Wilcoxon effect sizes and the corresponding 95% confidence intervals are plotted (see also Table S4). The number of mock-treated sample pairs that passed the statistical significance tests with FDR < 0.05 (or −log10(FDR) > 1.3) are represented in curly brackets.
Figure 4.
Figure 4.. Impact of factors related to circRNA biogenesis on circRNA reliability.
(A) Both BSJ sites at annotated boundaries, (B) both BSJ sites at the same isoform, (C) both BSJ sites undergoing alternative splicing (AS), (D) BSJs with #RCSacross > 0, (E) BSJs with #(RCSacross - RCSwithin) > 0, (F) BSJs with RBPs binding to the flanking regions, (G) BSJs with #RCSacross > 0 or RBPs binding to the flanking regions, and (H) minimum distance of RBP-binding sites to BSJs. For (A, B, C, D, E, F, G), the odds ratios represented the ratios of the occurrences of the examined factors of (A, B, C, D, E, F, G) for non-depleted circRNAs to the occurrences of those for depleted circRNAs. The dashed lines represented odds ratio = 1. Odds ratios and P-values were determined using two-tailed FET. P-values were FDR adjusted across 19 mock-treated sample pairs for each examined factor using Benjamini–Hochberg correction. For (H), P-values were determined using WRST (greater or less) and FDR adjusted across 19 mock-treated sample pairs for each examined factor using Benjamini–Hochberg correction. The Wilcoxon effect sizes and the corresponding 95% confidence intervals are plotted (see also Table S4). The number of mock-treated sample pairs that passed the statistical significance tests with FDR < 0.05 (or −log10(FDR) > 1.3) are represented in curly brackets.
Figure S3.
Figure S3.. Impact of factors related to circRNA biogenesis on circRNA reliability.
(A, B, C) BSJ acceptor (top) or donor (bottom) splice sites that agreed to well-annotated exon boundaries of colinear transcripts, (B) splice site strength of BSJs, and (C) BSJ acceptor (top) or donor (bottom) splice sites that were subject to alternative splicing (AS) based on previously annotated colinear transcripts (the Ensembl annotation). The splice site strength of BSJs was estimated using MaxEntScan based on the maximum entropy model (MEM), first-order Markov model (FMM), and weight matrix model (WMM). (A, C) For (A, C), the odds ratios, which were determined using two-tailed Fisher’s exact test, represented the ratios of the occurrences of (A) BSJ acceptor/donor splice sites that agreed to well-annotated exon boundaries of colinear transcripts and (C) BSJ acceptor/donor splice sites that were subject to AS. The dashed lines represented odds ratio = 1. (A, B, C) P-values were determined using two-tailed FET (A, C) or WRST (greater or less; (B)) and FDR adjusted across 19 mock-treated sample pairs for each examined factor using Benjamini–Hochberg correction. For (B), the Wilcoxon effect sizes and the corresponding 95% confidence intervals are plotted (see also Table S4). The number of mock-treated sample pairs that passed the statistical significance tests with FDR < 0.05 (or −log10(FDR) > 1.3) are represented in curly brackets.
Figure 5.
Figure 5.. Impact of functional features on circRNA reliability.
Nine types of functional features (see the Materials and Methods section) for circRNAs are examined. For the bottom panel, the correlations between percentages of non-depleted circRNAs and number of supporting functional features are shown. P-values are determined using WRST (greater or less) and FDR adjusted across 19 mock-treated sample pairs for each examined feature using Benjamini–Hochberg correction. The Wilcoxon effect sizes and the corresponding 95% confidence intervals are plotted (see also Table S4). The number of mock-treated sample pairs that passed the statistical significance tests with FDR < 0.05 (or −log10(FDR) > 1.3) are represented in curly brackets.
Figure 6.
Figure 6.. Assessment of relative influence of each individual factor on circRNA reliability.
(A) RCVE scores of the examined factors for each mock-treated sample pair. The length of each color segment in each bar represents the RCVE score of the corresponding examined factor. (B) Ranking (top) and average ranking (bottom; see also the numbers in the parentheses) of the RCVE scores of the examined factors. (C) Correlations between the percentages of not-depleted circRNAs and number of samples observed the circRNAs in HeLa-based (left and right) or K562-based (middle) mock-treated sample pairs. Of note, the HeLa_1 and HeLa_2 mock-treated sample pairs were generated from the same group but different studies. (D) Comparisons of percentages of not-depleted circRNAs for HeLa-specific circRNA (left) or K562-specific circRNAs (right) detected in single replicate only or multiple replicates. All P-values are determined using two-tailed FET.
Figure S4.
Figure S4.. Percentages of not-depleted circRNAs with/without removing G-quadruplexes across BSJs.
No significant differences between circRNAs with and without removing G-quadruplexes across BSJs were observed (all P-values > 0.05). P-values were determined using two-tailed Fisher’s exact test. ns, not significant.
Figure 7.
Figure 7.
Robustness analyses of the approaches based on comparisons of paired mock and treated samples for assessing circRNA reliability. (A, B, C) Comparisons of the percentages of (A) RT-independent circRNAs, (B) RT-/non-RT–validated circRNAs, and (C) circAtlas-specific circRNAs in the not-depleted and depleted circRNAs for all mock-treated sample pairs examined. RT-independent and RT-/non-RT–validated circRNAs represented high-confidence circRNAs (see text and the Materials and Methods section). CircAtlas-specific circRNAs were the circAtlas circRNAs that are not observed in eight other publicly accessible circRNA databases (see the Materials and Methods section). P-values were determined using two-tailed FET and FDR adjusted across 19 mock-treated sample pairs for each examined feature using Benjamini–Hochberg correction. *FDR < 0.05. **FDR < 0.01. ***FDR < 0.001.
Figure S5.
Figure S5.. Comparisons of the percentages of the BSJs with donor and acceptor splice sites from the same colinear transcript isoforms for the BSJ events with (full-length circRNAs) or without (non-full–length circRNAs) the evidence of full-length circle sequences reconstructed by CIRI-full or circFL-seq for the 480,471 circAtlas BSJ events.
P-values were determined using two-tailed Fisher’s exact test.
Figure 8.
Figure 8.. The correlation between number of supporting factors and the percentage of high-confidence circRNAs (RT-independent or RT-/non-RT–validated circRNAs) and circAtlas-specific circRNAs for the 480,471 circAtlas circRNA candidates.
The right panel showed a clearer graph of the correlation between number of supporting factors and the percentage of circAtlas-specific circRNAs. The eight important factors illustrated in Fig 6 except for the factor of “supporting BSJ read count” were considered because this factor was dependent on the examined samples and the corresponding RNA-seq data. For the factors of “number of samples” and “number of supporting functional features,” three was used as a cutoff value. The detailed information can be found in Table S3.

Similar articles

Cited by

References

    1. Chen I, Chen CY, Chuang TJ (2015) Biogenesis, identification, and function of exonic circular RNAs. Wiley Interdiscip Rev RNA 6: 563–579. 10.1002/wrna.1294 - DOI - PMC - PubMed
    1. Chen LL (2020) The expanding regulatory mechanisms and cellular functions of circular RNAs. Nat Rev Mol Cell Biol 21: 475–490. 10.1038/s41580-020-0243-y - DOI - PubMed
    1. Jeck WR, Sorrentino JA, Wang K, Slevin MK, Burd CE, Liu J, Marzluff WF, Sharpless NE (2013) Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA 19: 141–157. 10.1261/rna.035667.112 - DOI - PMC - PubMed
    1. Salzman J, Chen RE, Olsen MN, Wang PL, Brown PO (2013) Cell-type specific features of circular RNA expression. PLoS Genet 9: e1003777. 10.1371/journal.pgen.1003777 - DOI - PMC - PubMed
    1. Guo JU, Agarwal V, Guo H, Bartel DP (2014) Expanded identification and characterization of mammalian circular RNAs. Genome Biol 15: 409. 10.1186/s13059-014-0409-z - DOI - PMC - PubMed

Publication types

LinkOut - more resources