Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 22;25(2):bbae001.
doi: 10.1093/bib/bbae001.

Benchmarking of computational methods for m6A profiling with Nanopore direct RNA sequencing

Affiliations

Benchmarking of computational methods for m6A profiling with Nanopore direct RNA sequencing

Simone Maestri et al. Brief Bioinform. .

Abstract

N6-methyladenosine (m6A) is the most abundant internal eukaryotic mRNA modification, and is involved in the regulation of various biological processes. Direct Nanopore sequencing of native RNA (dRNA-seq) emerged as a leading approach for its identification. Several software were published for m6A detection and there is a strong need for independent studies benchmarking their performance on data from different species, and against various reference datasets. Moreover, a computational workflow is needed to streamline the execution of tools whose installation and execution remains complicated. We developed NanOlympicsMod, a Nextflow pipeline exploiting containerized technology for comparing 14 tools for m6A detection on dRNA-seq data. NanOlympicsMod was tested on dRNA-seq data generated from in vitro (un)modified synthetic oligos. The m6A hits returned by each tool were compared to the m6A position known by design of the oligos. In addition, NanOlympicsMod was used on dRNA-seq datasets from wild-type and m6A-depleted yeast, mouse and human, and each tool's hits were compared to reference m6A sets generated by leading orthogonal methods. The performance of the tools markedly differed across datasets, and methods adopting different approaches showed different preferences in terms of precision and recall. Changing the stringency cut-offs allowed for tuning the precision-recall trade-off towards user preferences. Finally, we determined that precision and recall of tools are markedly influenced by sequencing depth, and that additional sequencing would likely reveal additional m6A sites. Thanks to the possibility of including novel tools, NanOlympicsMod will streamline the benchmarking of m6A detection tools on dRNA-seq data, improving future RNA modification characterization.

Keywords: N6-methyladenosine; Nanopore; RNA modifications; benchmarking; dRNA-seq; machine learning.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The NanOlympicsMod workflow and adopted datasets. (A) Schema of NanOlympicsMod, including input data, pre-processing steps, tools execution, post-processing and comparative analyses. (B) Experimental design for the four different datasets analysed by NanOlympicsMod; the methods used to generate the reference set of m6A hits in yeast and mouse are illustrated.
Figure 2
Figure 2
Key results executing the tools with default settings. (A) Number of hits detected by NanOlympicsMod for each tool in the Oligos dataset. (B) As (A) for the yeast dataset. (C) As (A) for the mouse dataset. (D) As (A) for the human dataset. (E) Distribution of m6A hits for each tool along the synthetic oligos. (F) as (E) for the yeast metagene. (G) as (E) for the mouse metagene. (H) as (E) for the human metagene. (I) Heatmap reporting the overlap of m6A hits for each pair of tools executed with default settings on the oligos dataset. The value in a cell represents, for each pair of tools, the proportion of hits in common to the number of hits of the tool on the row (see the schema on the left of the panel). (J) As in (I) for the yeast dataset. (K) As in (I) for the mouse dataset. (L) As in (I) for the human dataset.
Figure 3
Figure 3
Agreement with reference sets of m6A hits. (A) Precision, recall and F1 score for each tool executed at default conditions on the oligos dataset. According to Supplementary Table 1, GM and TM identify tools working on the genome (G) or transcriptome (T) space and require multiple conditions, respectively. GS and TS identify tools working on the genome (G) or transcriptome (T) space and requiring a single condition, respectively. (B) Precision and recall curves at different cut-off values for the tools indicated in (A) on the oligos dataset; for each tool, the default cut-off is indicated by a square; the performance of a random classifier is included. (C) As in (A) for the yeast dataset. (D) as in (A) for the mouse dataset. (E) as in (A) for the human dataset. (F) as in (B) for the yeast dataset. (G) as in (B) for the mouse dataset. (H) as in (B) for the human dataset.
Figure 4
Figure 4
Agreement with reference sets of m6A hits on RRACH+, accessible, and high-coverage bins. (A) Precision, recall and F1 score for each tool executed at default conditions on the mouse dataset on RRACH+ bins. According to Supplementary Table 1, GM and TM identify tools working on the genome (G) or transcriptome (T) space and requiring multiple conditions, respectively. GS and TS identify tools working on the genome (G) or transcriptome (T) space and requiring a single condition, respectively. (B) Precision and recall curves at different cut-off values for the tools indicated in (A) on the mouse dataset; for each tool, the default cut-off is indicated by a square; the performance of a random classifier is included. (C) as in (A) for DRACH+ bins outside of splice-site exclusion zones. (D) as in (B) for DRACH+ bins outside of splice-site exclusion zones. (E) as in (A) for bins with high coverage. (F) as in (B) for bins with high coverage.
Figure 5
Figure 5
Sequence features associated with true positive, false positive and false negative hits. (A) m6A hits of each tool were stratified based on their association to specific RRACH motifs, and their number and accuracy on the mouse dataset was reported. (B) Distribution of accuracy stratified for common and uncommon RRACH motifs. (C) De novo motif enrichment analysis was performed on 50 nt regions centred at false positive hits for each tool on the mouse dataset, and the most significant motif was reported, together with statistical significance and consensus motif; tools marked with * are restricted to RRACH/DRACH motifs by implementation. (D) Distribution of the GC content for 50 nt regions centred at true positive (TP), false negative (FN) and false positive (FP) m6A hits. (E) as (D) for the free energy. (F) as (D) for the Shannon entropy.
Figure 6
Figure 6
m6A calling saturation analysis. (A) Saturation analysis for m6A calling by various tools on the human dataset; the number of hits (y-axis) identified on subsets of the whole dataset (x-axis) is reported as a proportion of the number of hits identified on the whole dataset. (B) As in (A) where the y-axis reports the corresponding F1 score. (C) as in (A) where the y-axis reports the AUPRC.

References

    1. Boccaletto P, Bagiński B. MODOMICS: an operational guide to the use of the RNA modification pathways database. RNA Bioinformatics 2021;2284:481–505. - PubMed
    1. Roundtree IA, Evans ME, Pan T, He C. Dynamic RNA modifications in gene expression regulation. Cell 2017;169:1187–200. - PMC - PubMed
    1. He PC, He C. M6a RNA methylation: from mechanisms to therapeutic potential. EMBO J 2021;40:e105977. - PMC - PubMed
    1. Boulias K, Greer EL. Biological roles of adenine methylation in RNA. Nat Rev Genet 2023;24:143–60. - PMC - PubMed
    1. Wang S, Lv W, Li T, et al. . Dynamic regulation and functions of mRNA m6A modification. Cancer Cell Int 2022;22:48. - PMC - PubMed

Publication types