Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 23;22(1):277.
doi: 10.1186/s13059-021-02497-7.

SRCP: a comprehensive pipeline for accurate annotation and quantification of circRNAs

Affiliations

SRCP: a comprehensive pipeline for accurate annotation and quantification of circRNAs

Avigayel Rabin et al. Genome Biol. .

Abstract

Here we describe a new integrative approach for accurate annotation and quantification of circRNAs named Short Read circRNA Pipeline (SRCP). Our strategy involves two steps: annotation of validated circRNAs followed by a quantification step. We show that SRCP is more sensitive than other individual pipelines and allows for more comprehensive quantification of a larger number of differentially expressed circRNAs. To facilitate the use of SRCP, we generate a comprehensive collection of validated circRNAs in five different organisms, including humans. We then utilize our approach and identify a subset of circRNAs bound to the miRNA-effector protein AGO2 in human brain samples.

Keywords: AGO2; CircRNAs; Circular RNA; Pipeline; RNA metabolism; Splicing.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
A comprehensive approach for annotation and quantification of circRNAs. A As a first step towards using SRCP, we generated a comprehensive list of all possible circRNAs in a given tissue/species. We generated this list by merging circRNA coordinates provided by different pipelines. B We then reannotate the initial list to obtain one specific set of coordinates for each candidate circRNA. To do so, we rely on the fact that most circRNAs are flanked by already annotated splice sites. Then, if the start coordinates and the end coordinate of the circRNA are both exactly on a 5′ and 3′ boundaries of the transcript’s exons, we compute a score of 2. (ii) If only one coordinate is exactly on an exon boundary, the score is 1. (iii) If neither coordinate is on any exon boundary, the score is 0. We keep the transcript with the highest score. C We determine the cutoff (false-positive and false-negative rate) based on RNaseR sensitivity and expression level. Then we obtain a circRNA index. Importantly, circRNAs from other lists and/or databases can be added to enrich the circRNA index. D Once the circRNA index is set up, SRCP allows accurate quantification of circRNA reads
Fig. 2
Fig. 2
SRCP accurately annotates circRNAs. A Venn diagram of the circRNAs found by the circRNA-identification pipelines in analysis of the total RNA library from the GSE55872 dataset. For this and further analysis, we utilized only circRNAs which were found in the mock samples. B RNaseR/mock ratio distribution in Drosophila melanogaster. The data in orange represent the circular junctions and that in violet the linear junctions. For the linear junctions, we utilized the SRCP output of the mRNAs produced from the genes hosting the potential circRNAs. C The number of circRNAs identified as “true” positives as a function of the cutoff for circRNAs identified by 1, 2, 3, 4, or 5 of the pipelines used. The dotted lines indicate three potential threshold/cutoffs (0.85, 0.9, or 0.95 respectively). The cutoff is defined as the fraction of linear mRNAs that would have some resistance to RNaseR. D Number of true and false circRNAs that have been identified from 1, 2, 3, 4, or 5 pipelines for different cutoffs in (C). E Boxplots showing the distribution of expression (top) and the RNaseR/mock ratio (bottom) of the true and false circRNA that are identified either by 3 (right), 4 (middle), or 5 (left) pipelines. F Percent of true and false positives identified by SCRP and each individual circRNA-identification pipeline
Fig. 3
Fig. 3
circRNAs can be accurately quantified using seed matching. A Total number of circRNA RNAseq reads for True circRNAs detected by the different pipelines in the 8 samples from female flies of the SRP001696 dataset. B Number of types of true circRNAs identified by each pipeline in each sample, using the same dataset as in (A). C Number of true-common circRNAs identified by each pipeline in each one of the indicated samples. As stated in the text, true-common circRNAs refers to circRNA identified by all the pipelines in the mock samples and with a RNaseR/mock ratio above the cutoff value. D Total number of circRNA RNAseq reads for true circRNAs detected by the different pipelines in one of the samples (SRR1197359) in the intact PE reads (100 bases long) or after computationally truncate them to 50 or 70 bases long. E Number of types of true circRNAs identified by the different pipelines in the SRR1197359 sample in the whole or truncated (to 50 or 70 bases) reads. F As in D and E, showing the number of reads originated from the True-common circRNAs. G Total number of true circRNA RNAseq reads for the different pipelines when analyzing the two reads (R1 and R2) of the SRR1197359 sample independently (as single-end reads). As in (D–F), we have done the analysis in the whole read (100 bases) or after it was truncated to be 50 or 70 bases long. H Pearson correlation heatmap visualizes quantification by the different pipelines and SRCP in the male and female fly samples (SRR1197359 and SRR1197473 respectively)
Fig. 4
Fig. 4
SRCP enables accurate identification of more differentially expressed circRNAs than other pipelines. A Relative amount of total circRNAs reads (for true circRNAs) in young (1 day old) and aged (20 days old) flies. The total number of circRNAs was calculated using the individual pipelines, and reads assigned to true circRNAs were added up for each condition. The average of young flies was normalized to 1. * Indicates significance (t-test, p value < 0.05), while NS indicates no statistically significant differences. The error bars represent the standard error of the mean (SEM). B Number of true (violet bars) and false-positive (orange bars) differentially expressed circRNA found by SRCP and the other circRNA-identification pipelines. C Validation of DE circRNAs by qPCR. Expression of target circRNAs and beta-Tubulin mRNA were normalized on the level of TBP mRNA. We then plotted the average of 3 independent biological replicates and the error bar represents the SEM. The average of young flies (1 day old) was normalized to 1. We performed t-test to compare 1 day old vs 20 days old. * Indicates p value < 0.05, ** p value < 0.005, NS non-significant
Fig. 5
Fig. 5
Validation of bona fide circRNAs in four mammalian species. A Strategy utilized to identify bona fide (true) circRNAs from several tissues from four different mammal species. B The percent of circRNAs identified as “true” positives as a function of the cutoff for circRNAs identified by 1, 2, 3, or 4 of the pipelines used in the indicated species and tissue. C Table summarizing the percentage of common circRNAs selected as true at the chosen threshold. D Table summarizing the number of pipelines that identify the sets of true and false circRNAs identified from the indicated tissues and species. For building this table, we utilized the thresholds marked in (B) as a dotted line and indicated in the table in (C). E Boxplots showing the distribution of expression of the true and false circRNA that are identified in the indicated species
Fig. 6
Fig. 6
circRNAs bind to AGO2 in the human brain. A Scheme of the approach utilized to analyze the AGO2 HITS-CLIP data set. B Table summarizing the circRNAs for which SRCP identified backsplicing reads in the AGO2 CLIP data. The list contains circRNAs that were found in at least 2 different human brain samples. Linear “left” or “right” reads refers to junctions encompassing the more proximal or more distal exon within the circRNA with the exon before or after in the linear mRNA respectively. C IGV snapshot of the AGO2-CLIP raw data in the region containing the gene hosting the AGO2-bound. We marked the backsplicing junction with a dashed line. We represented with a colored shadow the AGO2 cluster enrichment analysis and indicated the miRNAs for which an overlapping miRNA seed was identified

References

    1. Hanan M, Soreq H, Kadener S. CircRNAs in the brain. RNA Biol. 2017;14(8):1028–1034. doi: 10.1080/15476286.2016.1255398. - DOI - PMC - PubMed
    1. Ebbesen KK, Hansen TB, Kjems J. Insights into circular RNA biology. RNA Biol. 2017;14(8):1035–1045. doi: 10.1080/15476286.2016.1271524. - DOI - PMC - PubMed
    1. Barrett SP, Salzman J. Circular RNAs: analysis, expression and potential functions. Development. 2016;143(11):1838–1847. doi: 10.1242/dev.128074. - DOI - PMC - PubMed
    1. Patop IL, Wust S, Kadener S. Past, present, and future of circRNAs. EMBO J. 2019;38(16):e100836. doi: 10.15252/embj.2018100836. - DOI - PMC - PubMed
    1. Petkovic S, Muller S. RNA circularization strategies in vivo and in vitro. Nucleic Acids Res. 2015;43(4):2454–2465. doi: 10.1093/nar/gkv045. - DOI - PMC - PubMed

Publication types