Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Nov 15;33(22):3538-3548.
doi: 10.1093/bioinformatics/btx473.

SPRINT: an SNP-free toolkit for identifying RNA editing sites

Affiliations

SPRINT: an SNP-free toolkit for identifying RNA editing sites

Feng Zhang et al. Bioinformatics. .

Abstract

Motivation: RNA editing generates post-transcriptional sequence alterations. Detection of RNA editing sites (RESs) typically requires the filtering of SNVs called from RNA-seq data using an SNP database, an obstacle that is difficult to overcome for most organisms.

Results: Here, we present a novel method named SPRINT that identifies RESs without the need to filter out SNPs. SPRINT also integrates the detection of hyper RESs from remapped reads, and has been fully automated to any RNA-seq data with reference genome sequence available. We have rigorously validated SPRINT's effectiveness in detecting RESs using RNA-seq data of samples in which genes encoding RNA editing enzymes are knock down or over-expressed, and have also demonstrated its superiority over current methods. We have applied SPRINT to investigate RNA editing across tissues and species, and also in the development of mouse embryonic central nervous system. A web resource (http://sprint.tianlab.cn) of RESs identified by SPRINT has been constructed.

Availability and implementation: The software and related data are available at http://sprint.tianlab.cn.

Contact: weidong.tian@fudan.edu.cn.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The workflow and the methodology of SPRINT. (a) The work flow of SPRINT. (b) The number of SNV duplets (two consecutive SNVs with the same type of variation) at different distance intervals. SNP and RES duplets refer to SNV duplets in which both SNVs are SNPs and RESs, respectively. The sub-figure plots the fraction of RES duplets among all SNV duplets (RES duplet rate) at a given distance interval. In the horizontal axis of sub-figure, ‘200’ means ‘0–200’; ‘400’ means ‘200–400’ etc. (c) The A-to-G rate, the precision and the recall of the RESs identified by SPRINT when the cluster size cutoff is set at 2 while the distance cutoff varies. (d) Similar to (c) except that the distance cutoff is fixed at 200 nt while the cluster size cutoff varies. The SNVs used in (b–d) are those SNVs called by SPRINT in Alu regions of GM12878 (cytosolic) with two or more read counts
Fig. 2.
Fig. 2.
The validation of SPRINT’s effectiveness in detecting RESs. (a) The number of regular-RESs and (b) the number of hyper-RESs identified by SPRINT in wild-type and ADAR1 knockdown U87MG cell line (Bahn et al., 2012). (c) The number of regular-RESs and (d) the number of hyper-RESs identified by SPRINT in wild-type and ADARs (adr-1 and adr2) knockdown C.elegans embryos (strand-specific) (Zhao et al., 2015). (e) The number of C-to-U RESs identified by SPRINT in wild-type and Apobec-1 knockdown mouse intestine (Blanc et al., 2014). (f) Similar to (e) except that the samples are from mouse liver. In (a–f), ‘>’means ‘to’, A-to-I is detected as A-to-G, and C-to-U is detected as C-to-T. Others refer to all types of variations except A-to-I and C-to-U. Because U87MG and mouse (liver and intestine) RNA-seq datasets are not strand-specific (Bahn et al., 2012, 2014), A-to-G mismatches might be detected as T-to-C mismatches when reads are mapped to opposite strand, and C-to-T mismatches might be detected as G-to-A. Therefore, A-to-G and T-to-C editing sites are combined to represent A-to-I editing sites, while C-to-T and G-to-A RES are combined to represent C-to-U editing sites in those two datasets
Fig. 3.
Fig. 3.
RNA editing in different tissues of human, chimpanzee, rhesus and mouse. (a) The proportions of different categories of regular A-to-I RESs (left), hyper A-to-I RESs (middle) and C-to-U RESs (right) called by SPRINT in different tissues of human, chimpanzee, rhesus and mouse. ‘wks’ refers to ‘weeks’. (b) The number of CDS C-to-U RESs detected in different tissues of the four species. (c) The fractions of potential coding consequences (e.g. missense, synonymous, etc.) of CDS A-to-I RESs called by SPRINT in human. We used Variant Effect Predictor (VEP, http://www.ensembl.org/info/docs/tools/vep/) to annotate the potential coding consequences of human CDS regular and hyper A-to-I RESs. (d) The preference for the nucleotide before (−1, upper) and after (+1, lower) a regular A-to-I RES for RESs with different editing ratios. Here, the read depth of a RES is required to be greater than or equal to five. The nucleotide preference is plotted using WebLogo(Crooks et al., 2004). (e) Similar to (d) except that RESs with different read counts are investigated. (f) The averaged phyloP conservation scores (Pollard et al., 2010) at different positions relative to a regular A-to-I RES (left) or a hyper A-to-I RES (right) in 3’ UTR, with the position ranging from −1000 to +1000 (upper) and from -20 to +20 (lower). (The sequence patterns of A-to-I RESs in other categories can be found in Supplementary Fig. S7). In (d–f), the A-to-I RESs used are from human testis (25 weeks), because human testis has the most number of detected A-to-I RESs among all tissues investigated in this study. (g) Similar to (f), except that C-to-U RESs are investigated. The left and right sub-figures are plotted using the C-to-U RESs identified from mouse liver (7–8 weeks) and mouse ad-Apobec1 liver, respectively
Fig. 4.
Fig. 4.
The normalized number of A-to-I RESs across different tissues in the four species. (a) The normalized number of regular A-to-I RESs versus the normalized number of hyper A-to-I RESs for all samples investigated in this study. The normalized number refers to the number of RESs per one million reads. (b) The number of regular A-to-I RESs versus the number of hyper A-to-I RESs for all genes in human testis (25 weeks). PCC refers to Pearson Correlation Coefficient. R (version 3.2.2) is used to calculate PCC and p-value with the command options of ‘cor.test (x, y, alternative=’greater’)’. (c) The normalized number of A-to-I RESs (the union of regular and hyper A-to-I RESs) in different tissues of the four species. ‘wks’ refers to weeks
Fig. 5.
Fig. 5.
RNA editing in embryonic and adult mouse tissues. (a) The normalized number and (b) the proportions of different categories of A-to-I RESs, and (c) the proportions of different categories of C-to-U RESs in mouse embryonic and adult tissues. (d) The mean expression level change of newly edited genes during the development of embryonic CNS (upper) and liver (lower). The line segment represents the mean expression level changes [log2 (fold change)] of newly edited genes (1822 and 1139 genes in CNS and liver, respectively), while the null distributions are plotted by computing the mean expression level change of randomly selected genes (the same number as the newly edited genes) in CNS and liver (10 000 times of randomization)

References

    1. Bahn J.H. et al. (2012) Accurate identification of A-to-I RNA editing in human by transcriptome sequencing. Genome Res., 22, 142–150. - PMC - PubMed
    1. Benne R. (1996) RNA editing. The long and the short of it. Nature, 380, 391–392. - PubMed
    1. Blanc V., Davidson N.O. (2003) C-to-U RNA editing: mechanisms leading to genetic diversity. J. Biol. Chem., 278, 1395–1398. - PubMed
    1. Blanc V. et al. (2014) Genome-wide identification and functional analysis of Apobec-1-mediated C-to-U RNA editing in mouse small intestine and liver. Genome Biol., 15, R79.. - PMC - PubMed
    1. Borchert G.M. et al. (2009) Adenosine deamination in human transcripts generates novel microRNA binding sites. Hum. Mol. Genet., 18, 4801–4807. - PMC - PubMed