Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jun 16;16(1):126.
doi: 10.1186/s13059-015-0690-5.

Statistically based splicing detection reveals neural enrichment and tissue-specific induction of circular RNA during human fetal development

Affiliations

Statistically based splicing detection reveals neural enrichment and tissue-specific induction of circular RNA during human fetal development

Linda Szabo et al. Genome Biol. .

Erratum in

Abstract

Background: The pervasive expression of circular RNA is a recently discovered feature of gene expression in highly diverged eukaryotes, but the functions of most circular RNAs are still unknown. Computational methods to discover and quantify circular RNA are essential. Moreover, discovering biological contexts where circular RNAs are regulated will shed light on potential functional roles they may play.

Results: We present a new algorithm that increases the sensitivity and specificity of circular RNA detection by discovering and quantifying circular and linear RNA splicing events at both annotated and un-annotated exon boundaries, including intergenic regions of the genome, with high statistical confidence. Unlike approaches that rely on read count and exon homology to determine confidence in prediction of circular RNA expression, our algorithm uses a statistical approach. Using our algorithm, we unveiled striking induction of general and tissue-specific circular RNAs, including in the heart and lung, during human fetal development. We discover regions of the human fetal brain, such as the frontal cortex, with marked enrichment for genes where circular RNA isoforms are dominant.

Conclusions: The vast majority of circular RNA production occurs at major spliceosome splice sites; however, we find the first examples of developmentally induced circular RNAs processed by the minor spliceosome, and an enriched propensity of minor spliceosome donors to splice into circular RNA at un-annotated, rather than annotated, exons. Together, these results suggest a potentially significant role for circular RNA in human development.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Computational pipeline to identify circular RNA candidates. a We start with an annotated genome to create a database of junction sequences which is used to create two custom Bowtie2 junction indices: 1) a scrambled junction index containing all possible junction sequences formed either from circularization of a single exon or by each pair of exons in non-canonical order within a 1 Mb sliding window; 2) a linear junction index containing these exon pairs in canonical order. For single-end data (SE), mismatch rates in all reads aligned to a given junction are used to determine whether it is a true or false positive (see “Materials and methods”). For paired-end RNA-Seq data, read 1 (R1) and read 2 (R2) are aligned independently as single-end reads to these junction indices and a Bowtie2 genome index. Each R1 that aligned to a scrambled junction and did not align to the genome or a linear junction is categorized based on its mate alignment: circular if the mate aligns within the genomic region of the presumed circle defined by the junctional exons or decoy if the mate aligns outside this region. Each R1 that aligned to a linear junction is categorized as linear if the mate aligns concordantly to support a linear transcript. Reads in the linear and decoy categories are used to fit a generalized linear model (GLM). The GLM predicts the probability that each circular read belongs to class 1 (true positive) versus class 2 (false positive). The de novo analysis pipeline is shown in green. All reads that did not align to any of the indices are used to create a Bowtie2 index of de novo junction sequences, and all of the unaligned reads are re-aligned to this index. Each R1 from the de novo alignment is categorized based on its mate alignment just as the reads aligned to annotated exon junctions are. See “Materials and methods” for details. b Sequencing errors can lead to incorrect alignment to a circular junction. This results in either a false positive circular read or decoy read depending on whether R2 aligns inside or outside of the circularized region defined by R1
Fig. 2
Fig. 2
Statistical algorithm improves the precision of circular RNA detection. a Circular junctions with at least one aligned read are divided into two groups based on posterior probabilities: greater than 0.5 (orange); 0.5 or less (purple). Density distributions of the read counts are shown for each group in a representative ENCODE tissue sample (adult heart). b Cumulative distribution of posterior probability for circular RNA detected (read count ≥ 1) in poly(A) + and poly(A)- ENCODE H1 human embryonic stem cell line. Higher posterior probability indicates increased confidence that a circle is a true positive versus an artifact. c Putative circles with the highest read counts (labeled in red) in the H1 poly(A) + sample are identified as false positives. d Cumulative distribution of p value calculated based on the naïve model described in “Materials and methods”, using the same paired-end data used in panel (b). Higher p value indicates increased confidence that a circle is a true positive versus an artifact
Fig. 3
Fig. 3
Statistical algorithm improves the sensitivity of circular RNA detection. a, b Circular RNA detected by both algorithms are divided into false positives (FP; flagged as false positives due to low posterior probability) or true positives (TP; our posterior probability ≥ 0.9). a Number of circular RNAs detected by our GLM or CIRI in ENCODE BJ poly(A)+/− data and HeLa RNase-R+/− data generated by Gao et al. [23]. CIRI results are based on all default parameters except the -E flag set to exclude false positives resulting from identical colinear exons. b Number of circular RNAs detected by our GLM or find_circ in ENCODE BJ poly(A)+/− data and HeLa RNase-R- data generated by Gao et al. [23]. c Circular RNAs detected in HeLa RNase-R+ and Ribo- data generated by Gao et al. [23] and poly(A)+, and poly(A)- data generated by ENCODE. Number of circular RNAs detected by our GLM method (one or more reads, posterior probability ≥ 0.9) compared with CIRI (default parameters except -E). For GLM results, the first number is the total number of circles and the number of those which were detected by the de novo portion of the algorithm are listed in parentheses. d Venn diagram comparing the number of putative circular RNAs identified by our annotation-dependent algorithm in Rnase-R-treated H9 cells and the results published by Zhang et al. [22]. Green circles and red circles show circular RNA identified by our algorithm with high and low confidence, respectively; the blue circle shows those identified by Zhang et al. e Total junctional reads for circles comprised of a single exon (posterior probability ≥ 0.9, read count > 1) shown by size for same data as in panel (d). Median exon length is shown in red. The x-axis is truncated at 2000 excluding 31 long exons, all but one with total read counts < 50
Fig. 4
Fig. 4
Circular RNAs are induced during development. a Genome-wide distributions of z scores for linear and circular junctions in our heart and lung data show significant skewing of z scores in circular junctions towards positive values corresponding to circular RNA induction. b Quantitative RT-PCR confirms greater induction of circular RNA in several organs; heart and lung are shown here (intestine and stomach in Additional file 14). Plotted values are ΔΔCt = ΔCt(Age 20 weeks) – ΔCt(Age 10 weeks), where ΔCt = Ct(ACTB) – Ct(target); error-bars are standard error of the mean of technical replicates. Positive ΔΔCt indicates increased expression later in development, and is log2 scale. c A similar trend is seen in the ENCODE data: 14 out of 20 tissues, including heart and lung, have a majority of genes with increasing circular:linear expression compared with decreasing circular:linear expression (genes called with p < 0.05). Net # genes with circle fraction increasing is defined as Number of genes with circle fraction increasing from early to late timepoint – Number of genes with circle fraction decreasing. Tissues not labeled in the figure contained in the 0–500 bar are spinal cord, thyroid, metanephros, liver, umbilical cord, occipital lobe, cerebellum, diencephalon, uterus (all with data only available from ENCODE)
Fig. 5
Fig. 5
Circular RNAs have high abundance in many tissues and tissue-specific expression programs. a In many fetal tissues, especially regions of the brain, hundreds of genes have dominant circular isoforms in early and late time point samples. Late time point depicted for simplicity: for each organ, the total number of genes with greater circular RNA compared with linear RNA is plotted (p < 0.05). Asterisks indicate regions of the brain. b Many genes with tissue-specific increases in expression are also more highly expressed as circular compared with linear isoforms. Normalized expression levels from two samples, early (circles) and late time points (squares), in four genes illustrate this phenomenon (see “Materials and methods”). Statistically significant outliers (p < 0.001) include several subregions of the brain (DOPEY2 and the RNA binding protein R3HDM1), the frontal cortex (GLIS3) and skeletal muscle (RYR1); regression line (red) is plotted if there is a significant relationship with circular expression; x = y line plotted in black. JRPKM (junctional reads per kilobase per read mapped) have a comparable interpretation to RPKM (see “Materials and methods”)
Fig. 6
Fig. 6
NCX1 is a highly expressed and conserved circular RNA. a qPCR agrees with sequencing estimates and shows that circular isoforms of NCX1 are induced in the fetal heart and during in vitro cardiomyocyte differentiation. Plotted values are ΔCt = Ct(ACTB) – Ct(target); error bars are standard error of the mean of technical replicates for fetal heart, and of biological triplicates for human ESC (hESC) to cardiomyocytes. b Our de novo sequencing algorithm predicted a minor circular isoform differing by a deletion of three nucleotides from the dominant circular isoform; it arises from use of a splice-acceptor just downstream of the annotated splice-acceptor. The minor circular isoform was confirmed by PCR and clone sequencing. In the diagram, exonic sequences from genome annotations are given in bold uppercase, and splice-signal dinucleotides are highlighted in red; the mouse and rat NCX1 sequences are shown in blue. In the rat, the NCX1 circular isoform was only detected with the aid of our de novo algorithm, as the circle junction does not coincide with the annotated start of the first exon. Notably, in the mouse the start of this exon is annotated as exactly where we see circular splicing in the rat and mouse
Fig. 7
Fig. 7
U12 circular RNA has tissue-specific increases in development. In the gene diagrams of (a, d), annotated exons are shown as gold boxes, un-annotated “cryptic” exons as gray boxes. Definitive U12-type introns are indicated by “U12” in green; other introns are U2-type (or possibly cryptic U12-type). Splice-signal dinucleotides are shown in red. a Our de novo algorithm identified two circular isoforms in RANBP17 that use the U12-type splice signal following exon 20; these were validated by PCR and clone sequencing (which also identified the third isoform shown). b By RT-qPCR, the de novo RANBP17 circular isoforms show induction during fetal development in all tissues examined; the expression varies between tissues, for example, being significantly higher in the heart than the intestine. Values plotted are ΔΔCt = ΔCt(20 weeks) – ΔCt(10 weeks), where ΔCt = Ct(ACTB) – Ct(RANBP17); error bars are standard error of the mean of technical replicates. c The fraction of RANBP17 transcripts that are circular isoforms increases over developmental time. Shown are the percentages of each RANBP17 isoform, based on RNA-Seq junctional reads, at two different time points in fetal heart development. “circle1” = chr5:170632616:170610174, “circle2” = chr5:170632616 :170610198 (hg19 genome junctional coordinates; the third circle was not included in this analysis). Total junctional read counts were 240 and 267 for 19 and 28 week samples, respectively. d The de novo algorithm identified a circular junction in ATXN10, between the U12-type splice signal following exon 10 to a specific site within intron 9. PCR and clone sequencing with outward-facing primers in exon 10 verified the junction and also showed that additional un-annotated exonic sequences also form part of these circular isoforms, which show alternative splicing. Pathological expansion of a short repeat within intron 9 is a genetic hallmark of spinocerebellar ataxia type 10; the repeat region, marked with a red triangle, lies close to exonic sequences that we have identified as contributing to the circle

References

    1. Salzman J, Gawad C, Wang PL, Lacayo N, Brown PO. Circular RNAs are the predominant transcript isoform from hundreds of human genes in diverse cell types. PLoS One. 2012;7:e30733. doi: 10.1371/journal.pone.0030733. - DOI - PMC - PubMed
    1. Memczak S, Jens M, Elefsinioti A, Torti F, Krueger J, Rybak A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013;495:333–8. doi: 10.1038/nature11928. - DOI - PubMed
    1. Salzman J, Chen RE, Wang PL, Olsen M, Brown PO. Cell-type specific regulation of circular RNA expression. PLoS Genet. 2013;9:e1003777. doi: 10.1371/journal.pgen.1003777. - DOI - PMC - PubMed
    1. Wang PL, Bao Y, Yee MC, Barrett SP, Hogan GJ, Olsen MN, et al. Circular RNA Is Expressed across the eukaryotic Tree of Life. PLoS One. 2014;9:e90859. doi: 10.1371/journal.pone.0090859. - DOI - PMC - PubMed
    1. Jeck WR, Sorrentino JA, Wang K, Slevin MK, Burd CE, Liu J, et al. Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA. 2013;19:141–57. doi: 10.1261/rna.035667.112. - DOI - PMC - PubMed

Publication types

MeSH terms