Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Nov 14;9(11):e112040.
doi: 10.1371/journal.pone.0112040. eCollection 2014.

Detection theory in identification of RNA-DNA sequence differences using RNA-sequencing

Affiliations

Detection theory in identification of RNA-DNA sequence differences using RNA-sequencing

Jonathan M Toung et al. PLoS One. .

Abstract

Advances in sequencing technology have allowed for detailed analyses of the transcriptome at single-nucleotide resolution, facilitating the study of RNA editing or sequence differences between RNA and DNA genome-wide. In humans, two types of post-transcriptional RNA editing processes are known to occur: A-to-I deamination by ADAR and C-to-U deamination by APOBEC1. In addition to these sequence differences, researchers have reported the existence of all 12 types of RNA-DNA sequence differences (RDDs); however, the validity of these claims is debated, as many studies claim that technical artifacts account for the majority of these non-canonical sequence differences. In this study, we used a detection theory approach to evaluate the performance of RNA-Sequencing (RNA-Seq) and associated aligners in accurately identifying RNA-DNA sequence differences. By generating simulated RNA-Seq datasets containing RDDs, we assessed the effect of alignment artifacts and sequencing error on the sensitivity and false discovery rate of RDD detection. Overall, we found that even in the presence of sequencing errors, false negative and false discovery rates of RDD detection can be contained below 10% with relatively lenient thresholds. We also assessed the ability of various filters to target false positive RDDs and found them to be effective in discriminating between true and false positives. Lastly, we used the optimal thresholds we identified from our simulated analyses to identify RDDs in a human lymphoblastoid cell line. We found approximately 6,000 RDDs, the majority of which are A-to-G edits and likely to be mediated by ADAR. Moreover, we found the majority of non A-to-G RDDs to be associated with poorer alignments and conclude from these results that the evidence for widespread non-canonical RDDs in humans is weak. Overall, we found RNA-Seq to be a powerful technique for surveying RDDs genome-wide when coupled with the appropriate thresholds and filters.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Sensitivity of RNA-DNA sequence difference detection versus coverage.
The sensitivity or true positive rate of RNA-DNA sequence difference identification is shown versus various thresholds on the minimum depth of coverage required at the site of the simulated difference. For all four aligners, the true positive rate increases sharply upon raising the minimum depth of coverage required for detection from 0x to approximately 50x, after which it plateaus.
Figure 2
Figure 2. Sensitivity of RDD detection versus the simulated RDD level.
Here we depict the true positive rate of RDD detection versus the simulated RDD level, or the percentage of reads at the site bearing the sequence difference allele. A minimum of 1 read bearing the RNA-DNA sequence difference is sufficient for a site to be deemed correctly identified. Sites with coverage less than 10x per the simulated RNA-Seq dataset are removed from consideration.
Figure 3
Figure 3. Simulated versus observed levels of RNA-DNA sequence differences.
Here we plot the simulated RDD level versus the observed level as determined by GSNAP, MapSplice, RUM, or Tophat for replicate 1. Sites with coverage less than 10x or a RDD level less than 10% per the simulated dataset are removed from consideration. Overall, we observed the correlation between simulated and observed levels to be approximately 98% in both datasets and across the various aligners and replicates.
Figure 4
Figure 4. False discovery rate of RNA-DNA sequence difference detection.
Here we depict the false discovery rate of RNA-DNA sequence difference detection under various thresholds on the coverage, level of sequence difference, and number of reads bearing the sequence difference base per the aligner. Calculations are averaged across the three replicates and error bars represent standard deviation values.
Figure 5
Figure 5. Distribution of RNA-DNA sequence differences in GM12878.
Here we depict the distribution of RNA-DNA sequence differences in GM12878 after removing sites using various filters.

References

    1. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, et al. (2008) Alternative isoform regulation in human tissue transcriptomes. Nature 456: 470–476. - PMC - PubMed
    1. Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ (2008) Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet 40: 1413–1415. - PubMed
    1. Peng Z, Cheng Y, Tan BC, Kang L, Tian Z, et al. (2012) Comprehensive analysis of RNA-Seq data reveals extensive RNA editing in a human transcriptome. Nat Biotechnol 30: 253–260. - PubMed
    1. Park E, Williams B, Wold BJ, Mortazavi A (2012) RNA editing in the human ENCODE RNA-seq data. Genome Res 22: 1626–1633. - PMC - PubMed
    1. Heap GA, Yang JH, Downes K, Healy BC, Hunt KA, et al. (2010) Genome-wide analysis of allelic expression imbalance in human primary cells by high-throughput transcriptome resequencing. Hum Mol Genet 19: 122–134. - PMC - PubMed

Publication types