. 2017 Nov 15;33(22):3538-3548.

doi: 10.1093/bioinformatics/btx473.

SPRINT: an SNP-free toolkit for identifying RNA editing sites

Feng Zhang^{1

2}, Yulan Lu³, Sijia Yan^{4

5}, Qinghe Xing^{4

5}, Weidong Tian^{1

2

4}

Affiliations

¹ State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development.
² Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai 200436, China.
³ The Molecular Genetic Diagnosis Center, Shanghai Key Lab of Birth Defect, Translational Medicine Research Center of Children Development and Diseases, Pediatrics Research Institute.
⁴ Children's Hospital of Fudan University, Shanghai 201102, China.
⁵ Institute of Biomedical Sciences, Fudan University, Shanghai 200032, China.

PMID: 29036410
PMCID: PMC5870768
DOI: 10.1093/bioinformatics/btx473

SPRINT: an SNP-free toolkit for identifying RNA editing sites

Feng Zhang et al. Bioinformatics. 2017.

. 2017 Nov 15;33(22):3538-3548.

doi: 10.1093/bioinformatics/btx473.

Authors

Feng Zhang^{1

2}, Yulan Lu³, Sijia Yan^{4

5}, Qinghe Xing^{4

5}, Weidong Tian^{1

2

4}

Affiliations

¹ State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development.
² Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai 200436, China.
³ The Molecular Genetic Diagnosis Center, Shanghai Key Lab of Birth Defect, Translational Medicine Research Center of Children Development and Diseases, Pediatrics Research Institute.
⁴ Children's Hospital of Fudan University, Shanghai 201102, China.
⁵ Institute of Biomedical Sciences, Fudan University, Shanghai 200032, China.

PMID: 29036410
PMCID: PMC5870768
DOI: 10.1093/bioinformatics/btx473

Abstract

Motivation: RNA editing generates post-transcriptional sequence alterations. Detection of RNA editing sites (RESs) typically requires the filtering of SNVs called from RNA-seq data using an SNP database, an obstacle that is difficult to overcome for most organisms.

Results: Here, we present a novel method named SPRINT that identifies RESs without the need to filter out SNPs. SPRINT also integrates the detection of hyper RESs from remapped reads, and has been fully automated to any RNA-seq data with reference genome sequence available. We have rigorously validated SPRINT's effectiveness in detecting RESs using RNA-seq data of samples in which genes encoding RNA editing enzymes are knock down or over-expressed, and have also demonstrated its superiority over current methods. We have applied SPRINT to investigate RNA editing across tissues and species, and also in the development of mouse embryonic central nervous system. A web resource (http://sprint.tianlab.cn) of RESs identified by SPRINT has been constructed.

Availability and implementation: The software and related data are available at http://sprint.tianlab.cn.

Contact: weidong.tian@fudan.edu.cn.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
The workflow and the methodology of SPRINT. (a) The work flow of SPRINT. (b) The number of SNV duplets (two consecutive SNVs with the same type of variation) at different distance intervals. SNP and RES duplets refer to SNV duplets in which both SNVs are SNPs and RESs, respectively. The sub-figure plots the fraction of RES duplets among all SNV duplets (RES duplet rate) at a given distance interval. In the horizontal axis of sub-figure, ‘200’ means ‘0–200’; ‘400’ means ‘200–400’ etc. (c) The A-to-G rate, the precision and the recall of the RESs identified by SPRINT when the cluster size cutoff is set at 2 while the distance cutoff varies. (d) Similar to (c) except that the distance cutoff is fixed at 200 nt while the cluster size cutoff varies. The SNVs used in (b–d) are those SNVs called by SPRINT in Alu regions of GM12878 (cytosolic) with two or more read counts

**Fig. 2.**
The validation of SPRINT’s effectiveness in detecting RESs. (a) The number of regular-RESs and (b) the number of hyper-RESs identified by SPRINT in wild-type and ADAR1 knockdown U87MG cell line (Bahn *et al.*, 2012). (c) The number of regular-RESs and (d) the number of hyper-RESs identified by SPRINT in wild-type and ADARs (adr-1 and adr2) knockdown *C.elegans* embryos (strand-specific) (Zhao *et al.*, 2015). (e) The number of C-to-U RESs identified by SPRINT in wild-type and Apobec-1 knockdown mouse intestine (Blanc *et al.*, 2014). (f) Similar to (e) except that the samples are from mouse liver. In (a–f), ‘>’means ‘to’, A-to-I is detected as A-to-G, and C-to-U is detected as C-to-T. Others refer to all types of variations except A-to-I and C-to-U. Because U87MG and mouse (liver and intestine) RNA-seq datasets are not strand-specific (Bahn *et al.*, 2012, 2014), A-to-G mismatches might be detected as T-to-C mismatches when reads are mapped to opposite strand, and C-to-T mismatches might be detected as G-to-A. Therefore, A-to-G and T-to-C editing sites are combined to represent A-to-I editing sites, while C-to-T and G-to-A RES are combined to represent C-to-U editing sites in those two datasets

**Fig. 3.**
RNA editing in different tissues of human, chimpanzee, rhesus and mouse. (a) The proportions of different categories of regular A-to-I RESs (left), hyper A-to-I RESs (middle) and C-to-U RESs (right) called by SPRINT in different tissues of human, chimpanzee, rhesus and mouse. ‘wks’ refers to ‘weeks’. (b) The number of CDS C-to-U RESs detected in different tissues of the four species. (c) The fractions of potential coding consequences (e.g. missense, synonymous, etc.) of CDS A-to-I RESs called by SPRINT in human. We used Variant Effect Predictor (VEP, http://www.ensembl.org/info/docs/tools/vep/) to annotate the potential coding consequences of human CDS regular and hyper A-to-I RESs. (d) The preference for the nucleotide before (−1, upper) and after (+1, lower) a regular A-to-I RES for RESs with different editing ratios. Here, the read depth of a RES is required to be greater than or equal to five. The nucleotide preference is plotted using WebLogo(Crooks *et al.*, 2004). (e) Similar to (d) except that RESs with different read counts are investigated. (f) The averaged phyloP conservation scores (Pollard *et al.*, 2010) at different positions relative to a regular A-to-I RES (left) or a hyper A-to-I RES (right) in 3’ UTR, with the position ranging from −1000 to +1000 (upper) and from -20 to +20 (lower). (The sequence patterns of A-to-I RESs in other categories can be found in Supplementary Fig. S7). In (d–f), the A-to-I RESs used are from human testis (25 weeks), because human testis has the most number of detected A-to-I RESs among all tissues investigated in this study. (g) Similar to (f), except that C-to-U RESs are investigated. The left and right sub-figures are plotted using the C-to-U RESs identified from mouse liver (7–8 weeks) and mouse ad-Apobec1 liver, respectively

**Fig. 4.**
The normalized number of A-to-I RESs across different tissues in the four species. (a) The normalized number of regular A-to-I RESs versus the normalized number of hyper A-to-I RESs for all samples investigated in this study. The normalized number refers to the number of RESs per one million reads. (b) The number of regular A-to-I RESs versus the number of hyper A-to-I RESs for all genes in human testis (25 weeks). PCC refers to Pearson Correlation Coefficient. R (version 3.2.2) is used to calculate PCC and p-value with the command options of ‘cor.test (x, y, alternative=’greater’)’. (c) The normalized number of A-to-I RESs (the union of regular and hyper A-to-I RESs) in different tissues of the four species. ‘wks’ refers to weeks

**Fig. 5.**
RNA editing in embryonic and adult mouse tissues. (a) The normalized number and (b) the proportions of different categories of A-to-I RESs, and (c) the proportions of different categories of C-to-U RESs in mouse embryonic and adult tissues. (d) The mean expression level change of newly edited genes during the development of embryonic CNS (upper) and liver (lower). The line segment represents the mean expression level changes [log2 (fold change)] of newly edited genes (1822 and 1139 genes in CNS and liver, respectively), while the null distributions are plotted by computing the mean expression level change of randomly selected genes (the same number as the newly edited genes) in CNS and liver (10 000 times of randomization)

See this image and copyright information in PMC

References

1. Bahn J.H. et al. (2012) Accurate identification of A-to-I RNA editing in human by transcriptome sequencing. Genome Res., 22, 142–150. - PMC - PubMed
1. Benne R. (1996) RNA editing. The long and the short of it. Nature, 380, 391–392. - PubMed
1. Blanc V., Davidson N.O. (2003) C-to-U RNA editing: mechanisms leading to genetic diversity. J. Biol. Chem., 278, 1395–1398. - PubMed
1. Blanc V. et al. (2014) Genome-wide identification and functional analysis of Apobec-1-mediated C-to-U RNA editing in mouse small intestine and liver. Genome Biol., 15, R79.. - PMC - PubMed
1. Borchert G.M. et al. (2009) Adenosine deamination in human transcripts generates novel microRNA binding sites. Hum. Mol. Genet., 18, 4801–4807. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

SPRINT: an SNP-free toolkit for identifying RNA editing sites

Affiliations

SPRINT: an SNP-free toolkit for identifying RNA editing sites

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources