. 2024 Jan;21(1):1-14.

doi: 10.1080/15476286.2023.2290843. Epub 2023 Dec 13.

Unraveling C-to-U RNA editing events from direct RNA sequencing

Adriano Fonzino¹, Caterina Manzari¹, Paola Spadavecchia¹, Uday Munagala², Serena Torrini², Silvestro Conticello^{2

3}, Graziano Pesole^{1

4

5}, Ernesto Picardi^{1

4

6}

Affiliations

¹ Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, Italy.
² Core Research Laboratory, ISPRO, Florence, Italy.
³ National Research Council, Institute of Clinical Physiology, Pisa, Italy.
⁴ National Research Council, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Bari, Italy.
⁵ Consorzio Interuniversitario Biotecnologie, Trieste, Italy.
⁶ National Institute of Biostructures and Biosystems (INBB), Roma, Italy.

PMID: 38090878
PMCID: PMC10732634
DOI: 10.1080/15476286.2023.2290843

Unraveling C-to-U RNA editing events from direct RNA sequencing

Adriano Fonzino et al. RNA Biol. 2024 Jan.

. 2024 Jan;21(1):1-14.

doi: 10.1080/15476286.2023.2290843. Epub 2023 Dec 13.

Authors

Adriano Fonzino¹, Caterina Manzari¹, Paola Spadavecchia¹, Uday Munagala², Serena Torrini², Silvestro Conticello^{2

3}, Graziano Pesole^{1

4

5}, Ernesto Picardi^{1

4

6}

Affiliations

¹ Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, Italy.
² Core Research Laboratory, ISPRO, Florence, Italy.
³ National Research Council, Institute of Clinical Physiology, Pisa, Italy.
⁴ National Research Council, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Bari, Italy.
⁵ Consorzio Interuniversitario Biotecnologie, Trieste, Italy.
⁶ National Institute of Biostructures and Biosystems (INBB), Roma, Italy.

PMID: 38090878
PMCID: PMC10732634
DOI: 10.1080/15476286.2023.2290843

Abstract

In mammals, RNA editing events involve the conversion of adenosine (A) in inosine (I) by ADAR enzymes or the hydrolytic deamination of cytosine (C) in uracil (U) by the APOBEC family of enzymes, mostly APOBEC1. RNA editing has a plethora of biological functions, and its deregulation has been associated with various human disorders. While the large-scale detection of A-to-I is quite straightforward using the Illumina RNAseq technology, the identification of C-to-U events is a non-trivial task. This difficulty arises from the rarity of such events in eukaryotic genomes and the challenge of distinguishing them from background noise. Direct RNA sequencing by Oxford Nanopore Technology (ONT) permits the direct detection of Us on sequenced RNA reads. Surprisingly, using ONT reads from wild-type (WT) and APOBEC1-knock-out (KO) murine cell lines as well as in vitro synthesized RNA without any modification, we identified a systematic error affecting the accuracy of the Cs call, thereby leading to incorrect identifications of C-to-U events. To overcome this issue in direct RNA reads, here we introduce a novel machine learning strategy based on the isolation Forest (iForest) algorithm in which C-to-U editing events are considered as sequencing anomalies. Using in vitro synthesized and human ONT reads, our model optimizes the signal-to-noise ratio improving the detection of C-to-U editing sites with high accuracy, over 90% in all samples tested. Our results suggest that iForest, known for its rapid implementation and minimal memory requirements, is a promising tool to denoise ONT reads and reliably identify RNA modifications.

Keywords: C-to-U editing; Direct RNA sequencing; RNA editing; RNA modifications; iForest.

PubMed Disclaimer

Conflict of interest statement

No potential conflict of interest was reported by the author(s).

Figures

**Figure 1.**
Distribution of RNA variants in four GTEx tissues.

**Figure 2.**
Snapshot of a ±3 region surrounding the known editing site chr2:121983221 of the B2m gene. A graphical representation of the frequencies of aligned bases along with deletions and insertions. Data were retrieved from both KO (on the top) and WT (on the bottom) from Illumina (a, c) and ONT (b, d) runs.

**Figure 3.**
Average alignment profile of Illumina ‘ground-truth’ sites putatively related to the APOBEC1 enzyme signature (U bases are shown here as T).

**Figure 4.**
a) Pairplot of the principal component analysis summarizing basecalling features (central U base quality, mean quality, mismatches, insertion and deletion count) extracted from Illumina ‘ground-truth’ sites of WT (blue) and KO (orange) CU-context reads. The first three components explain more than 80% of the total variance of the data. b) Pairplot describing CU context reads retrieved from Illumina ‘ground-truth’ sites of both WT (blue dots) and KO (orange dots) ONT runs. A total of five features are shown: T_qual is the quality of the uridine central base; mean_qual is the average quality of bases on an interval of ±3 nucleotides; mism_count is the number of mismatches with respect to the reference expected bases on the same interval; ins_count and del_count are the total numbers of insertions and deletions within the interval, respectively.

**Figure 5.**
Dimensionality reduction by t-SNE of basecalling features (central U base quality, mean quality, mismatches, insertion and deletion count) extracted from Illumina ‘ground-truth’ sites of WT (blue) and KO (orange) CU-context reads.

**Figure 6.**
Analysis of the ionic current features for the site chr2:121983221 residing in the 3’UTR of the mouse B2m gene locus. (a) and (b) are shown the distributions of C and U currents for WT and KO samples, respectively. (c) is reported the distribution of U currents only from WT and KO samples, while (d) is depicted the same distribution for C currents only. PCA of current features (intervals of ±2 nucleotides) for WT and KO samples are shown in (e) and (f), respectively. Each dot in PCA graphs represents an aligned C (blue) or a U (red).

**Figure 7.**
PCA analysis of ionic current features extracted from synthetic constructs dataset. Each dot in PCA graphs represents an aligned C (blue) or a U (orange).

**Figure 8.**
iForest model for the training, validation, testing, and prediction of C-to-U editing events at the “per-read” level and then, after the aggregation step on the “genome-space” level. On the top, the workflow used for the training of the model, starting from encoded base-calling features. On the bottom, schematization of the encoding strategy used to compress base-calling feature information that is provided to the model.

See this image and copyright information in PMC

References

1. Boccaletto P, Stefaniak F, Ray A, et al. MODOMICS: a database of RNA modification pathways. 2021 update. Nucleic Acids Res. 2022;50(D1):D231–5. doi: 10.1093/nar/gkab1083 - DOI - PMC - PubMed
1. Gott JM, Emeson RB.. Functions and mechanisms of RNA editing. Ann Rev Genet. 2000;34(1):499–531. doi: 10.1146/annurev.genet.34.1.499 - DOI - PubMed
1. Christofi T, Zaravinos A.. RNA editing in the forefront of epitranscriptomics and human health. J Transl Med. 2019;17(1):319. doi: 10.1186/s12967-019-2071-4 - DOI - PMC - PubMed
1. Eisenberg E, Levanon EY. A-to-I RNA editing—immune protector and transcriptome diversifier. Nat Rev Genet. 2018;19(8):473–490. doi: 10.1038/s41576-018-0006-1 - DOI - PubMed
1. Nishikura K. A-to-I editing of coding and non-coding RNAs by ADARs. Nat Rev Mol Cell Biol. 2016;17(2):83–96. doi: 10.1038/nrm.2015.4 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- figshare - Access datasets and other research materials.
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Unraveling C-to-U RNA editing events from direct RNA sequencing

Affiliations

Unraveling C-to-U RNA editing events from direct RNA sequencing

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Miscellaneous