Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan;21(1):1-14.
doi: 10.1080/15476286.2023.2290843. Epub 2023 Dec 13.

Unraveling C-to-U RNA editing events from direct RNA sequencing

Affiliations

Unraveling C-to-U RNA editing events from direct RNA sequencing

Adriano Fonzino et al. RNA Biol. 2024 Jan.

Abstract

In mammals, RNA editing events involve the conversion of adenosine (A) in inosine (I) by ADAR enzymes or the hydrolytic deamination of cytosine (C) in uracil (U) by the APOBEC family of enzymes, mostly APOBEC1. RNA editing has a plethora of biological functions, and its deregulation has been associated with various human disorders. While the large-scale detection of A-to-I is quite straightforward using the Illumina RNAseq technology, the identification of C-to-U events is a non-trivial task. This difficulty arises from the rarity of such events in eukaryotic genomes and the challenge of distinguishing them from background noise. Direct RNA sequencing by Oxford Nanopore Technology (ONT) permits the direct detection of Us on sequenced RNA reads. Surprisingly, using ONT reads from wild-type (WT) and APOBEC1-knock-out (KO) murine cell lines as well as in vitro synthesized RNA without any modification, we identified a systematic error affecting the accuracy of the Cs call, thereby leading to incorrect identifications of C-to-U events. To overcome this issue in direct RNA reads, here we introduce a novel machine learning strategy based on the isolation Forest (iForest) algorithm in which C-to-U editing events are considered as sequencing anomalies. Using in vitro synthesized and human ONT reads, our model optimizes the signal-to-noise ratio improving the detection of C-to-U editing sites with high accuracy, over 90% in all samples tested. Our results suggest that iForest, known for its rapid implementation and minimal memory requirements, is a promising tool to denoise ONT reads and reliably identify RNA modifications.

Keywords: C-to-U editing; Direct RNA sequencing; RNA editing; RNA modifications; iForest.

PubMed Disclaimer

Conflict of interest statement

No potential conflict of interest was reported by the author(s).

Figures

Figure 1.
Figure 1.
Distribution of RNA variants in four GTEx tissues.
Figure 2.
Figure 2.
Snapshot of a ±3 region surrounding the known editing site chr2:121983221 of the B2m gene. A graphical representation of the frequencies of aligned bases along with deletions and insertions. Data were retrieved from both KO (on the top) and WT (on the bottom) from Illumina (a, c) and ONT (b, d) runs.
Figure 3.
Figure 3.
Average alignment profile of Illumina ‘ground-truth’ sites putatively related to the APOBEC1 enzyme signature (U bases are shown here as T).
Figure 4.
Figure 4.
a) Pairplot of the principal component analysis summarizing basecalling features (central U base quality, mean quality, mismatches, insertion and deletion count) extracted from Illumina ‘ground-truth’ sites of WT (blue) and KO (orange) CU-context reads. The first three components explain more than 80% of the total variance of the data. b) Pairplot describing CU context reads retrieved from Illumina ‘ground-truth’ sites of both WT (blue dots) and KO (orange dots) ONT runs. A total of five features are shown: T_qual is the quality of the uridine central base; mean_qual is the average quality of bases on an interval of ±3 nucleotides; mism_count is the number of mismatches with respect to the reference expected bases on the same interval; ins_count and del_count are the total numbers of insertions and deletions within the interval, respectively.
Figure 5.
Figure 5.
Dimensionality reduction by t-SNE of basecalling features (central U base quality, mean quality, mismatches, insertion and deletion count) extracted from Illumina ‘ground-truth’ sites of WT (blue) and KO (orange) CU-context reads.
Figure 6.
Figure 6.
Analysis of the ionic current features for the site chr2:121983221 residing in the 3’UTR of the mouse B2m gene locus. (a) and (b) are shown the distributions of C and U currents for WT and KO samples, respectively. (c) is reported the distribution of U currents only from WT and KO samples, while (d) is depicted the same distribution for C currents only. PCA of current features (intervals of ±2 nucleotides) for WT and KO samples are shown in (e) and (f), respectively. Each dot in PCA graphs represents an aligned C (blue) or a U (red).
Figure 7.
Figure 7.
PCA analysis of ionic current features extracted from synthetic constructs dataset. Each dot in PCA graphs represents an aligned C (blue) or a U (orange).
Figure 8.
Figure 8.
iForest model for the training, validation, testing, and prediction of C-to-U editing events at the “per-read” level and then, after the aggregation step on the “genome-space” level. On the top, the workflow used for the training of the model, starting from encoded base-calling features. On the bottom, schematization of the encoding strategy used to compress base-calling feature information that is provided to the model.

References

    1. Boccaletto P, Stefaniak F, Ray A, et al. MODOMICS: a database of RNA modification pathways. 2021 update. Nucleic Acids Res. 2022;50(D1):D231–5. doi: 10.1093/nar/gkab1083 - DOI - PMC - PubMed
    1. Gott JM, Emeson RB.. Functions and mechanisms of RNA editing. Ann Rev Genet. 2000;34(1):499–531. doi: 10.1146/annurev.genet.34.1.499 - DOI - PubMed
    1. Christofi T, Zaravinos A.. RNA editing in the forefront of epitranscriptomics and human health. J Transl Med. 2019;17(1):319. doi: 10.1186/s12967-019-2071-4 - DOI - PMC - PubMed
    1. Eisenberg E, Levanon EY. A-to-I RNA editing—immune protector and transcriptome diversifier. Nat Rev Genet. 2018;19(8):473–490. doi: 10.1038/s41576-018-0006-1 - DOI - PubMed
    1. Nishikura K. A-to-I editing of coding and non-coding RNAs by ADARs. Nat Rev Mol Cell Biol. 2016;17(2):83–96. doi: 10.1038/nrm.2015.4 - DOI - PMC - PubMed