Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 12;12(1):266.
doi: 10.1038/s41467-020-20459-8.

isoCirc catalogs full-length circular RNA isoforms in human transcriptomes

Affiliations

isoCirc catalogs full-length circular RNA isoforms in human transcriptomes

Ruijiao Xin et al. Nat Commun. .

Abstract

Circular RNAs (circRNAs) have emerged as an important class of functional RNA molecules. Short-read RNA sequencing (RNA-seq) is a widely used strategy to identify circRNAs. However, an inherent limitation of short-read RNA-seq is that it does not experimentally determine the full-length sequences and exact exonic compositions of circRNAs. Here, we report isoCirc, a strategy for sequencing full-length circRNA isoforms, using rolling circle amplification followed by nanopore long-read sequencing. We describe an integrated computational pipeline to reliably characterize full-length circRNA isoforms using isoCirc data. Using isoCirc, we generate a comprehensive catalog of 107,147 full-length circRNA isoforms across 12 human tissues and one human cell line (HEK293), including 40,628 isoforms ≥500 nt in length. We identify widespread alternative splicing events within the internal part of circRNAs, including 720 retained intron events corresponding to a class of exon-intron circRNAs (EIciRNAs). Collectively, isoCirc and the companion dataset provide a useful strategy and resource for studying circRNAs in human transcriptomes.

PubMed Disclaimer

Conflict of interest statement

Y.X. is a scientific cofounder of Panorama Medicine.

Figures

Fig. 1
Fig. 1. Overall experimental and computational workflow of isoCirc.
a Preparation of circular RNA (circRNA) libraries for isoCirc. Extracted total RNAs are subjected to linear RNA removal via ribosomal RNA (rRNA) depletion and RNase R treatment to enrich for circRNAs. The circRNA template is reverse transcribed and digested with nuclease to remove any 5′ overhang of the reverse transcription (RT) product. Ligation of the RT product generates the circular cDNA template. Rolling circle amplification of the circular cDNA template generates the sample for long-read sequencing. ncRNA: noncoding RNA. b Processing of long-read isoCirc sequencing data. In the consensus calling step, tandem repeats are detected from long reads and used to generate consensus sequences. For each read, a concatemer of two copies of the consensus sequence is mapped to the genome to identify the back-splice junction (BSJ) and forward-splice junctions (FSJs) within the circRNA. Alignment records are filtered using multiple stringent criteria (e.g., mapping quality, BSJ/FSJ fidelity). In this way, isoCirc enables identification of high-confidence BSJs and full-length circRNA isoforms.
Fig. 2
Fig. 2. isoCirc data of HEK293 cells.
a Heatmap showing pairwise comparison of similarity between high-confidence BSJs identified from six HEK293 libraries. Only BSJs with read count ≥2 were included. For each library pair, degree of similarity was calculated as number of shared high-confidence BSJs found in both libraries divided by total number of high-confidence BSJs in either library. Color reflects degree of similarity between two libraries, as indicated in legend. b Heatmap showing pairwise comparison of similarity between full-length circRNA isoforms (with read count ≥2) identified from six HEK293 libraries. Degree of similarity was determined as in (a). c Stacked barplot showing fraction of known or novel high-confidence BSJs identified in only one (‘1’) or both (‘2’) biological replicates of HEK293 cells, based on BSJ annotations in circBase (http://www.circbase.org) and MiOncoCirc (https://mioncocirc.github.io) databases. Only BSJs with read count ≥2 in each biological replicate (summing over three technical replicates) were included. Bars show known BSJs annotated in circBase only (red), MiOncoCirc only (blue), or both databases (‘Both’, green), and novel BSJs not annotated in either database (‘Novel’, purple). d Heatmap showing numbers of full-length circRNA isoforms identified in HEK293 cells, based on BSJ (x-axis) and FSJ (y-axis) categories as classified relative to existing transcript annotations. Only full-length circRNA isoforms with read count ≥2 (summing over all six libraries) were included. All identified circRNA BSJs and FSJs were classified using three categories: Full Splice Match (FSM), Novel In Catalog (NIC), and Novel Not in Catalog (NNC). e Long-read alignments (top) and gene structure diagrams (bottom) for the four most abundant full-length circRNA isoforms of KDM1A in HEK293 cells, as measured by isoCirc read count. All four isoforms were categorized as FSM for both BSJ and FSJs. Long-read alignments indicate multiple copies of circRNA templates in isoCirc reads. Inner black circle: circRNA gene structure with BSJ (red line) and FSJs (white lines). Blue circles: matched bases of isoCirc sequences aligned to reference genome sequence. Colored lines: mismatched bases (purple: A, red: C, green: G, yellow: T), insertions (black), and deletions (white), compared to reference genome sequence.
Fig. 3
Fig. 3. isoCirc characterization of circRNAs in 12 human tissues.
a Barplot showing number of full-length circRNA isoforms (y-axis) identified in each of 12 human tissues (x-axis), for read count = 1 (red), read count = 2 (blue), or read count ≥3 (green). b Correlation of number of circRNA BSJs identified from eight human tissues in published short-read datasets and isoCirc long-read datasets. For each tissue, number of unique circRNA BSJs identified in either short-read (x-axis) or long-read (y-axis) data was normalized by the total number of short reads or long reads in that tissue. Data were plotted as log10 transformed for ease of comparison and visualization. c Cumulative distribution plot of exon number per isoform for full-length circRNA isoforms with read count ≥2 in at least one of 12 human tissues. Isoforms were classified by their BSJ-FSJ categories, as follows: both BSJ and FSJs were FSM or NIC (red: FSM/NIC-FSM/NIC); BSJ was FSM or NIC, FSJs were NNC (blue: FSM/NIC-NNC); BSJ was NNC, FSJs were FSM or NIC (green: NNC-FSM/NIC); both BSJ and FSJs were NNC (purple: NNC–NNC); and all isoforms were combined (black: All–All). d Cumulative distribution plot of transcript length (nt) for full-length circRNA isoforms with read count ≥2 in at least one of 12 human tissues. Isoforms were classified by their BSJ-FSJ categories, as described in (c). e Heatmap showing numbers of genes with differential proportions of circRNA isoforms between each pair of 12 human tissues. f Heatmap showing numbers of circRNA isoforms with differential isoform proportions between each pair of 12 human tissues. g Stacked barplot showing isoform proportions of KDM1A circRNA isoforms across 12 human tissues. CircRNA isoforms were included in plot if read count was ≥2 in at least one tissue, and both BSJ and FSJs were FSM or NIC (FSM/NIC-FSM/NIC). Total read count for all circRNA isoforms in a tissue is given in parentheses on x-axis. h Stacked barplot showing isoform proportions of METTL3 circRNA isoforms across 12 human tissues. Details are as in (g).
Fig. 4
Fig. 4. isoCirc discovery of alternative splicing events within circRNAs.
a Pie chart showing percentages of isoform pairs in which the predominant isoform (with highest median read count across 12 human tissues and HEK293 cell line for a given gene) had alternative splicing differences in BSJ only, FSJs only, or both compared to each of the other isoforms in the gene. Number of isoform pairs for each category is given in parentheses next to category name in the legend. b Summary table showing number of internal alternative splicing events within circRNAs corresponding to four major types of alternative splicing patterns, when requiring that the minor isoform had at least 2, 5, or 10 isoCirc reads. Number of internal alternative splicing events in which the splicing events of both isoforms had FSJs annotated as FSM only or FSM/NIC are represented in two rightmost columns. c isoCirc read coverage tracks for 12 human tissues and aggregated HEK293 replicates displaying the two most abundant circRNA isoforms of PRPSAP1 – PRPSAP1.circRNA.1 and PRPSAP1.circRNA.2, which had an alternative splicing event corresponding to a retained or spliced intron, respectively. A separate track displaying base-level conservation scores across vertebrates (phyloP 46-way) is supplied. Transcript structures and BSJs of PRPSAP1.circRNA.1 and PRPSAP1.circRNA.2 are shown using red boxes and black arrows. Total number of reads across all 12 human tissues and HEK293 replicates for each isoform is indicated next to the isoform identifier.

Similar articles

Cited by

References

    1. Szabo L, Salzman J. Detecting circular RNAs: bioinformatic and experimental challenges. Nat. Rev. Genet. 2016;17:679–692. doi: 10.1038/nrg.2016.114. - DOI - PMC - PubMed
    1. Chen LL. The biogenesis and emerging roles of circular RNAs. Nat. Rev. Mol. Cell Biol. 2016;17:205–211. doi: 10.1038/nrm.2015.32. - DOI - PubMed
    1. Kristensen LS, et al. The biogenesis, biology and characterization of circular RNAs. Nat. Rev. Genet. 2019;20:675–691. doi: 10.1038/s41576-019-0158-7. - DOI - PubMed
    1. Salzman J. Circular RNA expression: its potential regulation and function. Trends Genet. 2016;32:309–316. doi: 10.1016/j.tig.2016.03.002. - DOI - PMC - PubMed
    1. Li X, Yang L, Chen LL. The biogenesis, functions, and challenges of circular RNAs. Mol. Cell. 2018;71:428–442. doi: 10.1016/j.molcel.2018.06.034. - DOI - PubMed

Publication types