Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Mar 22;22(2):1706-1728.
doi: 10.1093/bib/bbaa001.

The bioinformatics toolbox for circRNA discovery and analysis

Affiliations
Review

The bioinformatics toolbox for circRNA discovery and analysis

Liang Chen et al. Brief Bioinform. .

Abstract

Circular RNAs (circRNAs) are a unique class of RNA molecule identified more than 40 years ago which are produced by a covalent linkage via back-splicing of linear RNA. Recent advances in sequencing technologies and bioinformatics tools have led directly to an ever-expanding field of types and biological functions of circRNAs. In parallel with technological developments, practical applications of circRNAs have arisen including their utilization as biomarkers of human disease. Currently, circRNA-associated bioinformatics tools can support projects including circRNA annotation, circRNA identification and network analysis of competing endogenous RNA (ceRNA). In this review, we collected about 100 circRNA-associated bioinformatics tools and summarized their current attributes and capabilities. We also performed network analysis and text mining on circRNA tool publications in order to reveal trends in their ongoing development.

Keywords: bioinformatics tools; circRNA; disease biomarker; next generation sequencing; non-coding RNA; text mining.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Historical timeline of circRNA research. The development of knowledge, experimental and computational tools of circRNA is illustrated. Blue, green and red marks are the circRNA discoveries related to biology, experimental approaches and representative bioinformatics tools, respectively. In 1976, circRNAs were identified as circular RNA genomes in plant viroids by electron microscopy [4]. With similar methods, circRNA was found in the cytoplasm of eukaryotic cells in 1979 [5] and in the hepatitis delta virus (HDV) in 1986 [170]. In 1991, the first human circRNA was identified [171]. In 1995, circRNA was shown to have the capability of protein synthesis in vitro [2]. In 2006, RNase R treatment was found to enrich circRNA [33]. By 2012, genome-wide profiling of circRNAs by RNA-Seq [8] was demonstrated. In 2013, circRNA was shown to act as a miRNA sponge with examples such as CDR1as [9] and Sry [172]. CircRNAs could stably associate with RNA binding proteins (RBPs), such as Argonaute protein (AGO) and RNA polymerase II [9, 31]. In 2014, Arraystar launched the first commercial circRNA microarray, and its expression was profiled by array [173]. In 2015, circRNAs were proposed as cancer biomarkers, and could be detected in the exosome [174]. CircRNAs also were shown to be long-lived [32]. In 2017, endogenous circRNA was shown to be translated into functional polypeptides [16, 166, 175]. New methods were developed for isolating highly pure circRNA using RNase R treatment followed by polyadenylation and polyA+ RNA depletion (RPAD) in 2017 [176].
Figure 2
Figure 2
Biogenesis of circRNA in animal. CircRNA formation models: the back-splicing circularization requires the help of the both-end motif (Flanking AG-GU), complementary sequences (ALU elements) or RBPs. circRNA types: single exon, intronic, exon-intron and multi-exon. circRNA functions: miRNA sponge, protein sponge, translation and biomarker. Abbreviations: cerebrospinal fluid, CSF; RNA-induced silencing complex, RISC; RNA-binding protein, RBP.
Figure 3
Figure 3
Schematic of classified circRNA bioinformatics tools. (A) Schematic of classified circRNA bioinformatics tools. Examples of tools are listed for each category. polyA(+): polyA selected; polyA(−): polyA depleted; rRNA(−): rRNA depleted; RNase R(+): treated with RNase R; RPAD: RNase R treatment followed by polyadenylation and polyA(+) RNA depletion. (B) Categories of circRNA identification tools. The top is a model of BSJ read mapping to the genome. R1 and R2 represent read1 and read2, respectively, in a paired-end RNA-Seq dataset. Three illustrations of categories are shown below the model. The categories are BSJ-based, integrated-based and machine learning-based. BSJ-based can be more finely divided into two sub-classes: segmented-read based and pseudo-reference based.
Figure 4
Figure 4
Performance index of circRNA identification methods evaluated by different studies. The colored dot represents different performance indices, and the corresponding performance index for each dot is provided in the legend of the figure. Rows are different studies and classified by publication types, and the last row is the union set of each column. The columns represent circRNA identification tools, and the order is based on mentioned occurrences in those studies. Details regarding the dataset used in these studies are available in Supplementary Table 2.
Figure 5
Figure 5
CircRNA density illustration and species support in each database. (A) The human circRNA gene’s density in each database. The outermost layer is the linear gene density of human (UCSC hg19). Outside-in oriented, the different colored layers are circRNA gene density based on circAtlas, circBase, circNet, CIRCpedia, circRNAdb and circFunBase, respectively (coordinates based on hg19 version). The size of the dot represents the count of circRNA in every 1 million bases. (B) The number of different species circRNA records in each database. The count is log transformed. The literature source for each database is shown in Supplementary Figure 1.
Figure 6
Figure 6
CircRNA-disease association. The top figure is an enlargement of a corner of the bottom figure. Association with circRNAs and diseases, recorded in Circ2Disease, CircR2Disease and circRNADisease. All associations were manually curated from the literature. Purple marks are cancers and other diseases are in green. The circRNA host gene symbols and chromosome label are shown in the outer layer of the genome ideogram. The scatter plot represents the number of publications for each disease, and the stacked bar plot the number of circRNAs related with each disease, and the color represents the regulation direction. The links connect the diseases and circRNAs. The different regulation directions are marked by color: red indicates circRNA up-regulated in a disease, blue indicates down-regulated, green indicates unclear regulation direction, and black, circRNA that can be detected up and downregulated in different publications. In the gene density layer, red represents the disease-related circRNAs and green represents the functional circRNA records in the CircFunBase. A high-resolution figure (the gene symbol is visible) and the association details are available in Supplementary Figure 3 and Supplementary Table 3, respectively.
Figure 7
Figure 7
Illustration of interactions of circRNA publications and publication count by year. (A) circRNA article citation network by years. Each node refers to a publication, the edge represents the citation relationship and the size of the node is scaled by PageRank score. The network was drawn with a force-directed layout. Different types of publications or reviews are labeled with a different color. (B) Chord diagram represents the interaction strength of each different type of circRNA publication. (C) Number of publications related to circRNAs from 2000 to 2019. The publication list was searched on 17 July 2019 in PubMed with keywords (‘circRNA’ AND ‘circular RNA’). Red represents the publication of a circRNA tool, and blue represents a regular circRNA research article. Green represents the prediction increment of the publication, based on the number from 2016, 2017 and 2018 with a simple regression model.

References

    1. Li X, Yang L, Chen LL. The biogenesis, functions, and challenges of circular RNAs. Mol Cell 2018;71:428–42. - PubMed
    1. Chen CY, Sarnow P. Initiation of protein synthesis by the eukaryotic translational apparatus on circular RNAs. Science 1995;268:415–7. - PubMed
    1. Jeck WR, Sharpless NE. Detecting and characterizing circular RNAs. Nat Biotechnol 2014;32:453–61. - PMC - PubMed
    1. Sanger HL, Klotz G, Riesner D, et al. Viroids are single-stranded covalently closed circular RNA molecules existing as highly base-paired rod-like structures. Proc Natl Acad Sci U S A 1976;73:3852–6. - PMC - PubMed
    1. Hsu MT, Coca-Prados M. Electron microscopic evidence for the circular form of RNA in the cytoplasm of eukaryotic cells. Nature 1979;280:339–40. - PubMed

Publication types