Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug;20(8):1159-1169.
doi: 10.1038/s41592-023-01944-6. Epub 2023 Jul 13.

Large-scale benchmarking of circRNA detection tools reveals large differences in sensitivity but not in precision

Affiliations

Large-scale benchmarking of circRNA detection tools reveals large differences in sensitivity but not in precision

Marieke Vromman et al. Nat Methods. 2023 Aug.

Erratum in

Abstract

The detection of circular RNA molecules (circRNAs) is typically based on short-read RNA sequencing data processed using computational tools. Numerous such tools have been developed, but a systematic comparison with orthogonal validation is missing. Here, we set up a circRNA detection tool benchmarking study, in which 16 tools detected more than 315,000 unique circRNAs in three deeply sequenced human cell types. Next, 1,516 predicted circRNAs were validated using three orthogonal methods. Generally, tool-specific precision is high and similar (median of 98.8%, 96.3% and 95.5% for qPCR, RNase R and amplicon sequencing, respectively) whereas the sensitivity and number of predicted circRNAs (ranging from 1,372 to 58,032) are the most significant differentiators. Of note, precision values are lower when evaluating low-abundance circRNAs. We also show that the tools can be used complementarily to increase detection sensitivity. Finally, we offer recommendations for future circRNA detection and validation.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing interests.

Figures

Fig. 1 |
Fig. 1 |. CircRNA scientific relevance, structure and detection.
a, Over the last decade, circRNA research has increased rapidly, as illustrated by the proportional growth of publications mentioning circRNA in Europe PubMed Central. b, CircRNAs are formed through back-splicing, which results in a circular molecule with a back-spliced junction (BSJ). Black boxes highlight the BSJ in the circRNA isoforms. c, CircRNAs can be detected with RT–qPCR using a BSJ-specific primer pair. The primer pair can bind only in a divergent manner (facing away from each other) to linear RNA, where no amplification will be possible, yet binds the circRNA in a convergent manner (facing towards each other), amplifying the BSJ sequence. d, Large-scale circRNA detection is typically performed using total RNA sequencing datasets and specialized computational tools. These tools identify BSJ-spanning reads, which map divergently (in reverse order) on the linear reference genome.
Fig. 2 |
Fig. 2 |. CircRNA detection tools predict a wide variety of circRNAs.
a, This study consists of a circRNA detection phase and a circRNA validation phase. For the former, 16 circRNA detection tools were used to predict circRNAs in three deeply sequenced cancer cell lines. For the latter, a set of circRNAs was selected per tool and validated using three orthogonal methods, generating tool-specific precision values for each method. This was also used to compute compound precision and both types of sensitivity values for each circRNA detection tool. b, The number of reported circRNAs differs greatly between tools (shown for HLF cells; similar results for the other cell lines are shown in Supplementary Fig. 1). The tools are ordered according to the total number of predicted circRNAs. The vast majority of circRNAs are predicted with a BSJ count below 5 (in blue). Two tools, circRNA_finder, and segemehl, filtered their results to report only circRNAs with a BSJ count of at least 5 (in orange). Tool filtered the output based on BSJ count. c, The majority of circRNAs (49.9%) are detected by only one tool. Circseq_cup reports the largest set of unique circRNAs (shown for HLF cells; similar results for the other cell lines are shown in Supplementary Fig. 4). A small set of 55 circRNAs is detected by all tools (column n_db in Supplementary Table 2). d, CircRNA splice sites differ between circRNA detection tools. Most commonly, the canonical AGNGT pattern is observed, with AG being the splice acceptor, N the circRNA, and GT the splice donor. Circseq_cup, CirComPara2, Sailfish-cir and segemehl do not report strand information. To be able to retrieve a splicing sequence for the circRNAs from these tools, it was assumed that the circRNA originated from the positive strand. This led to the ACNCT pattern (reverse complement of AGNGT), most probably from circRNAs that were assigned to the positive strand incorrectly. Last, there are some tools that also report a substantial number of circRNA BSJ sequences with a GGNGG splicing pattern.
Fig. 3 |
Fig. 3 |. The precision of circRNA detection tools is generally high and similar, whereas tools largely differ with respect to the number of predicted circRNAs.
ac, The plots are separated based on circRNA BSJ count below 5 (low-abundance, in blue, 20 circRNAs selected per tool) or a BSJ count of at least 5 (high-abundance, in orange, 80 circRNAs selected per tool). Sailfish-cir reports TPM (transcripts per million) instead of BSJ count, and is therefore shown separately. Given that circRNA_finder and segemehl do not report any circRNAs with a BSJ count < 5, these tools are not included in the blue bar plots. a, CircRNAs were validated using three different techniques: RT–qPCR detection, resistance to degradation by RNase R, and amplicon sequencing (seq). Low-abundance circRNAs are in general more difficult to validate. Of note, the precision for low-abundance circRNAs is based on a limited set of circRNAs. High-abundance circRNAs have good precision for most tools and most validation methods. The error bars represent the 95% confidence intervals (CI). A set of circRNAs was excluded because their abundance was too low to enable assessment of their resistance to RNase R, resulting in a variable number of circRNAs per tool instead of 20 or 80 for low-abundance and high-abundance circRNAs, respectively (range, 10–18 or 71–80 circRNAs per tool, respectively, details in Supplementary Table 6). A random subset of circRNAs was included in the amplicon sequencing experiment, resulting in a variable number of circRNAs per tool for amplicon sequencing validation as well (range, 11–20 or 54–74 circRNAs per tool, respectively, details in Supplementary Table 6). b, The vast majority of circRNAs produce the same results based on the three different validation methods. However, some circRNAs have conflicting results. For example, there are 13 circRNAs that are detectable by RT–qPCR but also are degraded upon RNase R (RR) treatment and for which the primers seem to amplify the wrong product. c, The compound precision is used to calculate the theoretical number of true-positive circRNAs by multiplying it with the original number of circRNAs detected by that tool (that is, the extrapolated sensitivity) (shown for HLF; similar results for the other cell lines are shown in Supplementary Fig. 26).
Fig. 4 |
Fig. 4 |. The intersection or union of two circRNA detection tools decreases the number of false positives, or increases the overall number of detected circRNAs, respectively.
a, CircRNAs detected by multiple tools generally have higher precision. However, the often-used practice of using the intersection of two tools is not necessarily a guarantee of avoiding false-positive results. b, By considering the union of two circRNA detection tools, the number of circRNAs can be significantly increased while keeping the number of false-positive predictions low (shown for the HLF cell line; similar results for the other two cell lines are shown in Supplementary Fig. 37). For the y-axis, the percentage of detected circRNAs is calculated by dividing the number of circRNA detected by that tool combination by the total number of predicted circRNAs for that sample taking the union of all tools (13,087 circRNAs for the HLF sample). For this analysis, the compound precision of high-abundance circRNAs was used. Some circRNA detection tools are integrative and combine the results of multiple other tools. It is therefore assumed that an integrative tool would have large similarities with its underlying tools. However, a difference in tool version and filtering can still produce a different set of circRNAs. For example, CirComPara2 is an integrative tool that combines CIRCexplorer2, CIRI2, DCC and find_circ, but nevertheless, the combination of CirComPara2 and CIRCexplorer3 still produces a significant increase in detected circRNAs (corresponding to 10% of all circRNA predictions for that cell line).

Comment in

References

    1. Kristensen LS et al. The biogenesis, biology and characterization of circular RNAs. Nat. Rev. Genet. 20, 675–691 (2019). - PubMed
    1. Hulstaert E et al. Charting extracellular transcriptomes in the Human Biofluid RNA Atlas. Cell Rep. 33, 108552 (2020). - PubMed
    1. Wang S et al. Circular RNAs in body fluids as cancer biomarkers: the new frontier of liquid biopsies. Mol. Cancer 20, 13. (2021). - PMC - PubMed
    1. Vromman M et al. Validation of circular RNAs using RT-qPCR after effective removal of linear RNAs by ribonuclease R. Curr. Protoc. 1, e181 (2021). - PubMed
    1. Yu CY, Liu HJ, Hung LY, Kuo HC & Chuang TJ Is an observed non-co-linear RNA product spliced in trans, in cis or just in vitro?. Nucleic Acids Res. 42, 9410–9423 (2014). - PMC - PubMed

Publication types