Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2016 Oct 14;17(11):679-692.
doi: 10.1038/nrg.2016.114.

Detecting circular RNAs: bioinformatic and experimental challenges

Affiliations
Review

Detecting circular RNAs: bioinformatic and experimental challenges

Linda Szabo et al. Nat Rev Genet. .

Abstract

The pervasive expression of circular RNAs (circRNAs) is a recently discovered feature of gene expression in highly diverged eukaryotes. Numerous algorithms that are used to detect genome-wide circRNA expression from RNA sequencing (RNA-seq) data have been developed in the past few years, but there is little overlap in their predictions and no clear gold-standard method to assess the accuracy of these algorithms. We review sources of experimental and bioinformatic biases that complicate the accurate discovery of circRNAs and discuss statistical approaches to address these biases. We conclude with a discussion of the current experimental progress on the topic.

PubMed Disclaimer

Conflict of interest statement

Competing interests statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1. Circular RNA
Circular RNA (circRNA) is produced from both protein-coding genes and non-coding regions of the genome. Linear RNAs are formed by a covalent linkage between an upstream 3′ splice site and a downstream 5′ splice site of pre-messenger RNA (pre-mRNA), whereas circRNA is characterized by a covalent and canonical linkage between a downstream 3′ splice site and an upstream 5′ splice site in a process known as backsplicing. circRNAs lack poly(A) tails and can contain a single exon or multiple exons, as well as introns. Exons are numbered. Adapted from REF. .
Figure 2
Figure 2. Challenges for circRNA detection in RNA-seq
AaAc | Variations in preparation protocols alter the amount of circular RNA (circRNA) in a library. Poly(A) RNA is shown in pink, non-poly(A) RNA is shown in green and circular RNA is shown in blue. Aa | Common RNA purification methods, in order of increasing relative amounts of circRNA. circRNAs are depleted by poly(A) selection and retained in ribosomal RNA (rRNA) libraries. They constitute a large proportion of reads in an rRNA library that has also been depleted of poly(A) RNA, and are the primary RNA in RNase R-treated libraries. Ab | Size selection excludes very small circular and linear RNA. Ac | Oligo(dT) priming biases against circRNA. BaBc | Known sources of artefacts from common RNA-seq protocols. Ba | Reverse transcriptase (RT) can join two distinct RNA molecules in a non-canonical order, particularly when the two RNAs contain a common sequence. Bb | Two distinct cDNAs may be ligated together in non-canonical order during adaptor ligation. Bc | RT can displace cDNA from the template, generating a single cDNA that contains multiple copies of a circRNA. C | A convolution of homology and sequencing errors can lead to false alignments to a backsplice junction. In this case two fragments generated from a linear exon 2–exon 3 splice junction are sequenced with an error and incorrectly aligned to an exon 3–exon 2 backsplice. If the mate aligns outside the genomic region defined by the backsplice junction it is correctly discarded as a false positive, but if the mate aligns within the presumed circle it is incorrectly considered evidence of circRNA. For clarity, the mRNA sequence shown is the DNA equivalent.
Figure 3
Figure 3. Multiple circRNAs can be generated from a single locus
The RANBP17 locus is shown at the top, with the circularized region expanded below. The boxes represent annotated exons, with the location of the U12-type splice signal labelled. Three circular isoforms of RANBP17, formed by splicing of the 5′ end of exon 20 into three distinct locations within exon 17, were validated by PCR and clone sequencing; only circle 1 and circle 2 were algorithmically predicted.
Figure 4
Figure 4. Statistical considerations when using RNase R enrichment to assess genome-wide accuracy
Read counts were simulated in R (code available at https://github.com/lindaszabo/NRG) and confidence intervals were computed using rateratio.test (https://cran.r-project.org/web/packages/rateratio.test). a | Upper and lower bounds of the 95% confidence interval for the estimated RNase R fold enrichment when the same number of reads is observed for a given circular RNA (circRNA) in RNase R-treated and mock-treated control libraries sequenced at the same depth. b | Density distributions for the ratio of observed read counts for a given circRNA in RNase R+/control (that is, fold enrichment by RNase R) when the underlying true ratio is 5/1. When the two libraries have equal number of reads (red line), the expected value is 5. If the control library sequenced more deeply, then the expected observed fold enrichment decreases although the underlying rate parameter has not changed.

References

    1. Salzman J, Gawad C, Wang PL, Lacayo N, Brown PO. Circular RNAs are the predominant transcript isoform from hundreds of human genes in diverse cell types. PloS One. 2012;7:e30733. This article provided the first demonstration that circRNA was a ubiquitous and overlooked feature of eukaryotic gene expression. - PMC - PubMed
    1. Lasda E, Parker R. Circular RNAs: diversity of form and function. RNA. 2014;20:1829–1842. - PMC - PubMed
    1. Jeck WR, et al. Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA. 2013;19:141–157. - PMC - PubMed
    1. Zhang XO, et al. Complementary sequence-mediated exon circularization. Cell. 2014;159:134–147. - PubMed
    1. Szabo L, et al. Statistically based splicing detection reveals neural enrichment and tissue-specific induction of circular RNA during human fetal development. Genome Biol. 2015;16:126. The first published circRNA algorithm to develop a statistical score independent of read count for identifying true and false positives. - PMC - PubMed