Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 May 19;24(3):bbad163.
doi: 10.1093/bib/bbad163.

Concepts and methods for transcriptome-wide prediction of chemical messenger RNA modifications with machine learning

Affiliations
Review

Concepts and methods for transcriptome-wide prediction of chemical messenger RNA modifications with machine learning

Pablo Acera Mateos et al. Brief Bioinform. .

Abstract

The expanding field of epitranscriptomics might rival the epigenome in the diversity of biological processes impacted. In recent years, the development of new high-throughput experimental and computational techniques has been a key driving force in discovering the properties of RNA modifications. Machine learning applications, such as for classification, clustering or de novo identification, have been critical in these advances. Nonetheless, various challenges remain before the full potential of machine learning for epitranscriptomics can be leveraged. In this review, we provide a comprehensive survey of machine learning methods to detect RNA modifications using diverse input data sources. We describe strategies to train and test machine learning methods and to encode and interpret features that are relevant for epitranscriptomics. Finally, we identify some of the current challenges and open questions about RNA modification analysis, including the ambiguity in predicting RNA modifications in transcript isoforms or in single nucleotides, or the lack of complete ground truth sets to test RNA modifications. We believe this review will inspire and benefit the rapidly developing field of epitranscriptomics in addressing the current limitations through the effective use of machine learning.

Keywords: RNA modifications; deep learning; direct RNA sequencing; epitranscriptomics; machine learning; miCLIP.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Transcriptome-wide prediction of chemical messenger RNA modifications with ML. The identification of chemical modifications in messenger RNA (mRNA) involves (A) reading sequence information alone or in combination with experimental data, (B) training and testing of ML methods and (C) analysis of the predicted outputs in terms of properties such as the localization of the RNA modifications, association with specific mRNA isoforms, stoichiometry and functional characterization. XGBoost, eXtreme Gradient Boosting; LSTM, long short-term memory. The figure was created with BioRender.com.
Figure 2
Figure 2
Experiment-independent ML approaches to predict RNA modifications. The schematic illustrates a generic approach using the example of m6A. (A) RNA sequences are labeled according to whether they contain the RNA modification based on existing experimental data. (B) Experiment-independent ML methods can be built on sequence-only features, which are based on nucleotide strings from the RNA sequence, or on more general genome-derived features, which contain a mix of RNA sequence and other features, such as RNA secondary structure, relative location within the transcript or evolutionary conservation of the modified position. (C) Both DL and classical ML methods, such as RFs and SVMs have been used in experiment-independent RNA modification detection. (D) Choosing the right testing data and accuracy metrics is essential for the correct estimation of algorithm performance. Left: The AUROC is a commonly used performance metric based on true positives (TP) and false positives (FP). Right: A common approach for separating training and testing sequences maintains at least 70% of identity between the two groups. (E) Interpretability methods can be either model-specific or model-agnostic. The left panel shows two examples of model-specific interpretability: the relative importance of several features, and a sequence logo (position weight matrix) obtained using the activation values from the first layer of a CNN model for m6A detection. The right panel introduces in silico saturation mutagenesis as an example of a model-agnostic method for interpretability. The figure was created with BioRender.com.
Figure 3
Figure 3
The workflow of ML methods for data from targeted experiments. (A) m6Aboost [61] is trained on miCLIP2 data. Differential methylation analysis upon Mettl3 KO is used to identify positive and negative examples. The experimental protocol MAZTER-seq [63] identifies the m6A sites using the methylation-sensitive RNase MazF which cleaves at ACA motifs only if these are not methylated. (B) Multiple RNA sequence and additional features are extracted from the m6A sites identified by miCLIP2 and MAZTER-seq. (C) Classical ML models are trained on the extracted features. (D) The model can be used to identify RNA modification sites in a transcriptome-wide manner from new experimental data. Interpretability techniques enable investigating the determinants of m6A deposition. The figure was created with BioRender.com.
Figure 4
Figure 4
ML methods for RNA modification detection using DRS. (A) IVTs and experimentally determined modification sites from cell lines (symbolized by an antibody) can be used as training data for supervised learning methods to detect RNA modifications in nanopore DRS data. (B) Supervised learning methods use signal properties to detect RNA modifications. (C) SVMs can be used to detect RNA modifications by modeling basecalling errors. (D) Most unsupervised learning methods detecting RNA modifications need a background sample, usually a condition with a KO of the modifying enzyme. (E) Unsupervised clustering of signal properties such as mean signal value per 5-mer or dwell time can be used to detect RNA modifications. For unsupervised clustering, WT and KO (in the image) or background conditions are used to group signals into modified and unmodified clusters. (F) Statistical tests can also be used to test the asymmetric distributions of errors between normal and background conditions to detect RNA modifications. The figure was created with BioRender.com.

References

    1. Davis FF, Allen FW. Ribonucleic acids from yeast which contain a fifth nucleotide. J Biol Chem 1957;227(2):907–15. - PubMed
    1. Schaefer M, Kapoor U, Jantsch MF. Understanding RNA modifications: the promises and technological bottlenecks of the `epitranscriptome'. Open Biol 2017;7(5):170077. - PMC - PubMed
    1. Dominissini D, Moshitch-Moshkovitz S, Schwartz S, et al. . Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq. Nature 2012;485(7397):201–6. - PubMed
    1. Meyer KD, Saletore Y, Zumbo P, et al. . Comprehensive analysis of mRNA methylation reveals enrichment in 3' UTRs and near stop codons. Cell 2012;149(7):1635–46. - PMC - PubMed
    1. Squires JE, Patel HR, Nousch M, et al. . Widespread occurrence of 5-methylcytosine in human coding and non-coding RNA. Nucleic Acids Res 2012;40(11):5023–33. - PMC - PubMed

Publication types