Detection theory in identification of RNA-DNA sequence differences using RNA-sequencing

Jonathan M Toung¹, Nicholas Lahens¹, John B Hogenesch², Gregory Grant³

Affiliations

¹ Genomics and Computational Biology Graduate Program, University of Pennsylvania School of Medicine, Philadelphia, PA, United States of America.
² Institute for Biomedical Informatics, University of Pennsylvania School of Medicine, Philadelphia, PA, United States of America; Institute for Translational Medicine and Therapeutics, University of Pennsylvania School of Medicine, Philadelphia, PA, United States of America; Department of Pharmacology, University of Pennsylvania School of Medicine, Philadelphia, PA, United States of America.
³ Institute for Biomedical Informatics, University of Pennsylvania School of Medicine, Philadelphia, PA, United States of America; Institute for Translational Medicine and Therapeutics, University of Pennsylvania School of Medicine, Philadelphia, PA, United States of America; Department of Genetics, University of Pennsylvania School of Medicine, Philadelphia, PA, United States of America.

PMID: 25396741
PMCID: PMC4232354
DOI: 10.1371/journal.pone.0112040

Detection theory in identification of RNA-DNA sequence differences using RNA-sequencing

Jonathan M Toung et al. PLoS One. 2014.

. 2014 Nov 14;9(11):e112040.

doi: 10.1371/journal.pone.0112040. eCollection 2014.

Authors

Jonathan M Toung¹, Nicholas Lahens¹, John B Hogenesch², Gregory Grant³

Affiliations

¹ Genomics and Computational Biology Graduate Program, University of Pennsylvania School of Medicine, Philadelphia, PA, United States of America.
² Institute for Biomedical Informatics, University of Pennsylvania School of Medicine, Philadelphia, PA, United States of America; Institute for Translational Medicine and Therapeutics, University of Pennsylvania School of Medicine, Philadelphia, PA, United States of America; Department of Pharmacology, University of Pennsylvania School of Medicine, Philadelphia, PA, United States of America.
³ Institute for Biomedical Informatics, University of Pennsylvania School of Medicine, Philadelphia, PA, United States of America; Institute for Translational Medicine and Therapeutics, University of Pennsylvania School of Medicine, Philadelphia, PA, United States of America; Department of Genetics, University of Pennsylvania School of Medicine, Philadelphia, PA, United States of America.

PMID: 25396741
PMCID: PMC4232354
DOI: 10.1371/journal.pone.0112040

Abstract

Advances in sequencing technology have allowed for detailed analyses of the transcriptome at single-nucleotide resolution, facilitating the study of RNA editing or sequence differences between RNA and DNA genome-wide. In humans, two types of post-transcriptional RNA editing processes are known to occur: A-to-I deamination by ADAR and C-to-U deamination by APOBEC1. In addition to these sequence differences, researchers have reported the existence of all 12 types of RNA-DNA sequence differences (RDDs); however, the validity of these claims is debated, as many studies claim that technical artifacts account for the majority of these non-canonical sequence differences. In this study, we used a detection theory approach to evaluate the performance of RNA-Sequencing (RNA-Seq) and associated aligners in accurately identifying RNA-DNA sequence differences. By generating simulated RNA-Seq datasets containing RDDs, we assessed the effect of alignment artifacts and sequencing error on the sensitivity and false discovery rate of RDD detection. Overall, we found that even in the presence of sequencing errors, false negative and false discovery rates of RDD detection can be contained below 10% with relatively lenient thresholds. We also assessed the ability of various filters to target false positive RDDs and found them to be effective in discriminating between true and false positives. Lastly, we used the optimal thresholds we identified from our simulated analyses to identify RDDs in a human lymphoblastoid cell line. We found approximately 6,000 RDDs, the majority of which are A-to-G edits and likely to be mediated by ADAR. Moreover, we found the majority of non A-to-G RDDs to be associated with poorer alignments and conclude from these results that the evidence for widespread non-canonical RDDs in humans is weak. Overall, we found RNA-Seq to be a powerful technique for surveying RDDs genome-wide when coupled with the appropriate thresholds and filters.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. Sensitivity of RNA-DNA sequence difference detection versus coverage.**
The sensitivity or true positive rate of RNA-DNA sequence difference identification is shown versus various thresholds on the minimum depth of coverage required at the site of the simulated difference. For all four aligners, the true positive rate increases sharply upon raising the minimum depth of coverage required for detection from 0x to approximately 50x, after which it plateaus.

**Figure 2. Sensitivity of RDD detection versus the simulated RDD level.**
Here we depict the true positive rate of RDD detection versus the simulated RDD level, or the percentage of reads at the site bearing the sequence difference allele. A minimum of 1 read bearing the RNA-DNA sequence difference is sufficient for a site to be deemed correctly identified. Sites with coverage less than 10x per the simulated RNA-Seq dataset are removed from consideration.

**Figure 3. Simulated versus observed levels of RNA-DNA sequence differences.**
Here we plot the simulated RDD level versus the observed level as determined by GSNAP, MapSplice, RUM, or Tophat for replicate 1. Sites with coverage less than 10x or a RDD level less than 10% per the simulated dataset are removed from consideration. Overall, we observed the correlation between simulated and observed levels to be approximately 98% in both datasets and across the various aligners and replicates.

**Figure 4. False discovery rate of RNA-DNA sequence difference detection.**
Here we depict the false discovery rate of RNA-DNA sequence difference detection under various thresholds on the coverage, level of sequence difference, and number of reads bearing the sequence difference base per the aligner. Calculations are averaged across the three replicates and error bars represent standard deviation values.

**Figure 5. Distribution of RNA-DNA sequence differences in GM12878.**
Here we depict the distribution of RNA-DNA sequence differences in GM12878 after removing sites using various filters.

See this image and copyright information in PMC

References

1. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, et al. (2008) Alternative isoform regulation in human tissue transcriptomes. Nature 456: 470–476. - PMC - PubMed
1. Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ (2008) Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet 40: 1413–1415. - PubMed
1. Peng Z, Cheng Y, Tan BC, Kang L, Tian Z, et al. (2012) Comprehensive analysis of RNA-Seq data reveals extensive RNA editing in a human transcriptome. Nat Biotechnol 30: 253–260. - PubMed
1. Park E, Williams B, Wold BJ, Mortazavi A (2012) RNA editing in the human ENCODE RNA-seq data. Genome Res 22: 1626–1633. - PMC - PubMed
1. Heap GA, Yang JH, Downes K, Healy BC, Hunt KA, et al. (2010) Genome-wide analysis of allelic expression imbalance in human primary cells by high-throughput transcriptome resequencing. Hum Mol Genet 19: 122–134. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Detection theory in identification of RNA-DNA sequence differences using RNA-sequencing

Affiliations

Detection theory in identification of RNA-DNA sequence differences using RNA-sequencing

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources