Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Sep;22(9):1626-33.
doi: 10.1101/gr.134957.111.

RNA editing in the human ENCODE RNA-seq data

Affiliations

RNA editing in the human ENCODE RNA-seq data

Eddie Park et al. Genome Res. 2012 Sep.

Abstract

RNA-seq data can be mined for sequence differences relative to the reference genome to identify both genomic SNPs and RNA editing events. We analyzed the long, polyA-selected, unstranded, deeply sequenced RNA-seq data from the ENCODE Project across 14 human cell lines for candidate RNA editing events. On average, 43% of the RNA sequencing variants that are not in dbSNP and are within gene boundaries are A-to-G(I) RNA editing candidates. The vast majority of A-to-G(I) edits are located in introns and 3' UTRs, with only 123 located in protein-coding sequence. In contrast, the majority of non-A-to-G variants (60%-80%) map near exon boundaries and have the characteristics of splice-mapping artifacts. After filtering out all candidates with evidence of private genomic variation using genome resequencing or ChIP-seq data, we find that up to 85% of the high-confidence RNA variants are A-to-G(I) editing candidates. Genes with A-to-G(I) edits are enriched in Gene Ontology terms involving cell division, viral defense, and translation. The distribution and character of the remaining non-A-to-G variants closely resemble known SNPs. We find no reproducible A-to-G(I) edits that result in nonsynonymous substitutions in all three lymphoblastoid cell lines in our study, unlike RNA editing in the brain. Given that only a fraction of sites are reproducibly edited in multiple cell lines and that we find a stronger association of editing and specific genes suggests that the editing of the transcript is more important than the editing of any individual site.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
RNA SNV calling strategy. (A) Flowchart of analysis: 75-bp paired-end RNA-seq reads were mapped onto an extended genome (genome + known splice junctions + spikes) using Bowtie. Reads mapping onto splice sites and spikes were set aside, and reads mapping onto hg19 were used to call single nucleotide variants (SNVs). A parallel set of analyses was done using a collapsed set of reads with unique coordinates, and the intersections of SNVs from the uncollapsed and collapsed treatments were obtained. Known SNPs annotated in dbSNP132, sites outside gene boundaries, and intronic sites within 5 bp of splice junctions were removed. For the GM trio, any candidate with evidence of a private genomic variation was also removed. (B) Example of candidate editing site. Purple arrows pointing to the left represent reads on the (−) strand, while blue arrows pointing to the right represent reads on the (+) strand. The blocks represent variants between the reference DNA and the RNA-seq. A SNV is kept when at least three nonidentical reads support the SNV, with a minimum SNV frequency of 10%, and at least one edit per strand. (C) Intersection strategy for two replicates. For cell types with two replicates, the SNVs remaining after collapsing were intersected between the replicates. (D) The number of SNVs remaining after collapsing for the prefiltered sites. Number of SNVs that are only in the uncollapsed set are in blue; the intersection, purple; and collapsed set, red. (E) Collapsing increases the relative amount of A-to-G SNVs and also increases the relative number of transitions. Number of SNVs that are only in the uncollapsed set are in blue; the intersection, purple; and collapsed set, red. (F) The fraction of dbSNP is highest in the intersection of the full and collapsed sets. The relative amount of calls found in dbSNP132, novel genic SNVs, and other SNVs in the uncollapsed set are at the left; the collapsed set, right; and the intersection of the two, middle.
Figure 2.
Figure 2.
RNA editing calls in GM12878. (A) Most non–A-to-G SNVs are near splicing boundaries. The distribution relative to gene boundaries of A-to-G SNVs (left) versus non–A-to-G SNVs (right). (B) Example of reads mapped incorrectly across a known splice junction. Overhanging RNA-seq reads are mapped incorrectly into the intron when the correct position is in the adjacent exon, even though the splice junction was provided to the read mapper. (C) Distribution of SNVs at different steps in the pipeline. Prefiltered SNVs defined by having at least three nonidentical reads support the SNV, with a minimum SNV frequency of 10%, at least one edit per strand, and no more than one type of SNV for the same position in blue. SNVs annotated in dbSNP132 are red, SNVs that are not in dbSNP132 and within gene boundaries are green, SNVs that are not in dbSNP132 and within gene boundaries without splicing sites are purple, SNVs that had no matching 1000 Genome sequencing reads are in light blue, and SNVs passing ChIP filtering are in orange. (D) Frequency distribution of SNVs primarily reflects expression of homozygous and heterozygous SNPs. The SNVs that were found in dbSNP132 are in blue; the novel genic SNVs, red. (E) Most nonsplice adjoining SNPs are A-to-G. The nonsplicing novel genic A-to-G calls in filtered calls are in blue; nonsplicing novel genic A-to-G calls, red; nonsplicing novel genic non–A-to-G, brown; nonsplicing novel genic non–A-to-G in filtered calls, purple; and splicing-only novel genic, light blue. (F) Distribution of gene expression versus coverage of exonic sites are in red and intronic sites are in blue for genic SNVs. SNVs in more lowly expressed genes are primarily on exons, due to our minimum depth of coverage requirements.
Figure 3.
Figure 3.
Survey of SNV calls across ENCODE cell lines. (A) Distribution of nonsplicing novel genic SNVs for all data sets. (B) In every cell type, the percentage of A-to-G SNVs increase and the number of candidate sites decrease (red) after filtering for private SNVs using ChIP-seq. GM12878 calls were filtered with 1000 Genomes or ChIP-seq reads are labeled with G or C, respectively. (C) Relatively few non–A-to-G synonymous SNVs (purple), non–A-to-G nonsynonymous SNVs (green), A-to-G synonymous SNVs (red), A-to-G nonsynonymous SNVs (blue) are found in ORFs.
Figure 4.
Figure 4.
Gene level analysis of RNA editing after private SNV filtering. (A) Hierarchical clustering of the editing frequency of the 33.5% (1905 out of 5695 possible) individual A-to-G candidate editing sites occurring in at least two distinct cell types. (B) Hierarchical clustering of the number of edits in the 47.4% (662 out of 1395 possible) of genes edited in at least two distinct cell types. (C) RNA editing in genes cluster in the UTR or in the introns with few genes having edits in both UTR and introns. Percentage of genes with only UTR edits are in green; intronic edits, blue; and edits in both introns and UTR, red. (D) Reproducibility of calling RNA edits for human H1 ES cells. Scatter plot of RNA edit calls for rep 1,2 versus rep 3,4 is on a log2-log2 scale with a pseudocount of 1. A Gaussian noise was added to points to visualize density. (E) Venn diagrams of A-to-G candidate edits in lymphoblastoid cells from a hapmap trio. The Venn diagram of the individual sites (left) and edited genes (right); 35.8% of the union of edited sites are found in two or more cell types, while 54.2% of the union of edited genes are found in two or more cell types.

References

    1. The 1000 Genomes Project Consortium 2010. A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073 - PMC - PubMed
    1. Agranat L, Raitskin O, Sperling J, Sperling R 2008. The editing enzyme ADAR and the mRNA surveillance protein hUpf1 interact in the cell nucleus. Proc Natl Acad Sci 105: 5028–5033 - PMC - PubMed
    1. Athanasiadis A, Rich A, Maas S 2004. Widespread A-to-I RNA editing of Alu-containing mRNAs in the human transcriptome. PLoS Biol 2: e391 doi: 10.1371/journal.pbio.0020391 - PMC - PubMed
    1. Bahn JH, Lee JH, Li G, Greer C, Peng G, Xiao X 2012. Accurate identification of A-to-I RNA editing in human by transcriptome sequencing. Genome Res 22: 142–150 - PMC - PubMed
    1. Burns CM, Chu H, Rueter SM, Hutchinson LK, Canton H, Sanders-Bush E, Emeson RB 1997. Regulation of serotonin-2C receptor G-protein coupling by RNA editing. Nature 387: 303–308 - PubMed

Publication types

LinkOut - more resources