Interpretable prediction models for widespread m6A RNA modification across cell lines and tissues

doi:10.1093/bioinformatics/btad709

. 2023 Dec 1;39(12):btad709.

doi: 10.1093/bioinformatics/btad709.

Interpretable prediction models for widespread m6A RNA modification across cell lines and tissues

Ying Zhang¹, Zhikang Wang², Yiwen Zhang³, Shanshan Li³, Yuming Guo³, Jiangning Song^{2

4}, Dong-Jun Yu¹

Affiliations

¹ School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.
² Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.
³ School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia.
⁴ Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia.

PMID: 37995291
PMCID: PMC10697738
DOI: 10.1093/bioinformatics/btad709

Interpretable prediction models for widespread m6A RNA modification across cell lines and tissues

Ying Zhang et al. Bioinformatics. 2023.

. 2023 Dec 1;39(12):btad709.

doi: 10.1093/bioinformatics/btad709.

Authors

Ying Zhang¹, Zhikang Wang², Yiwen Zhang³, Shanshan Li³, Yuming Guo³, Jiangning Song^{2

4}, Dong-Jun Yu¹

Affiliations

¹ School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.
² Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.
³ School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia.
⁴ Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia.

PMID: 37995291
PMCID: PMC10697738
DOI: 10.1093/bioinformatics/btad709

Abstract

Motivation: RNA N6-methyladenosine (m6A) in Homo sapiens plays vital roles in a variety of biological functions. Precise identification of m6A modifications is thus essential to elucidation of their biological functions and underlying molecular-level mechanisms. Currently available high-throughput single-nucleotide-resolution m6A modification data considerably accelerated the identification of RNA modification sites through the development of data-driven computational methods. Nevertheless, existing methods have limitations in terms of the coverage of single-nucleotide-resolution cell lines and have poor capability in model interpretations, thereby having limited applicability.

Results: In this study, we present CLSM6A, comprising a set of deep learning-based models designed for predicting single-nucleotide-resolution m6A RNA modification sites across eight different cell lines and three tissues. Extensive benchmarking experiments are conducted on well-curated datasets and accordingly, CLSM6A achieves superior performance than current state-of-the-art methods. Furthermore, CLSM6A is capable of interpreting the prediction decision-making process by excavating critical motifs activated by filters and pinpointing highly concerned positions in both forward and backward propagations. CLSM6A exhibits better portability on similar cross-cell line/tissue datasets, reveals a strong association between highly activated motifs and high-impact motifs, and demonstrates complementary attributes of different interpretation strategies.

Availability and implementation: The webserver is available at http://csbio.njust.edu.cn/bioinf/clsm6a. The datasets and code are available at https://github.com/zhangying-njust/CLSM6A/.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

**Figure 1.**
Overview of the proposed CLSM6A framework. (A) Collection and processing of m6A RNA modification data. (B) Structure of the proposed CLSM6A in this study. (C) Model-based and propagation-based interpretation.

**Figure 2.**
CLSM6A enables the single-nucleotide-resolution prediction of m6A sites. (A) Prediction performance on the three tissue testing sets in terms of area under the receiver-operating characteristic curve (AUC), the area under the precision–recall curve (AP), Matthew’s correlation coefficient (MCC), accuracy (Acc), specificity (Spe), and sensitivity (Sen). Considered were im6A-TS-CNN and TS-m6A-DL trained on the sequence length of 41 (the length of original model) and 201 (the length same as CLSM6A). (B) ROC curves and PR curves of CLSM6A in cell lines and tissues. (C) A Sankey diagram of the number of samples and the predicted results in the testing dataset. (D) The feature space distribution visualization in four cell lines/tissues. (E) Performance of CLSM6A under different length of flanking sequence.

**Figure 3.**
Characteristic motifs identified from conventional motif discovery tool DREME can be detected by the first convolutional layer of CLSM6A. For each aligned result, the upper panel is the motif (with the smallest E-value in each cell line) identified by DREME. With the E-values displayed below. The bottom panel is the motif detected by CLSM6A, with the kernel motif number, P-value and number of overlaps provided, and P-value was calculated using TOMTOM by utilizing a null model containing CLSM6A’s motif columns from the top column in the set of DREME motifs.

**Figure 4.**
Illustration of the relationships between datasets and validation across lines and tissues. (A) Frequency-based clustermap of different cell lines and tissues, and the sequence motifs plotted by SeqLogo. (B) Venn diagram of modifications sharing the same exon. (C) Heatmap of cross-species validation with the used datasets on the x-axis and the models on the y-axis; a cell line/tissue specific model (in columns) was well-trained on its own training data and validated on the independent data on the cell line/tissue specific in rows. (D, E) The feature space distribution of activated motifs in cell lines/tissues colored by cell type.

**Figure 5.**
Analysis of motifs learned by CLSM6A. (A) Visualization of the motif distribution in the liver cell line, with the impact of filter on the x-axis and the average activated amount sequences on the y-axis. Motif filters with high impact or high activated amounts are displayed. (B) Correlation among motifs. Pairwise PCCs between motifs within the same cluster exhibit high values. The highly activated and high-impact motifs are marked in red and green, respectively. (C) Pairwise PCCs between top highly activated motifs and important motifs in 11 cell lines.

**Figure 6.**
Model-based interpretation and the propagation-based interpretation on the tissue of liver brain and kidney. (A) Global exhibition of positions the model focuses on via three strategies, with solid lines recording the averaged values and the light background ranging from the smallest value to the largest value in each position. (B) Local (single input example) attribution similarity by the two propagation-based interpretation methods. (C) Local exhibition (single input example) of activated subsequences by different filters. (D) Exhibition (single input example) of attribution vectors reported by propagation-based interpretations.

See this image and copyright information in PMC

Cited by

Interpretable deep cross networks unveiled common signatures of dysregulated epitranscriptomes across 12 cancer types.
Xia R, Yin X, Huang J, Chen K, Ma J, Wei Z, Su J, Blake N, Rigden DJ, Meng J, Song B. Xia R, et al. Mol Ther Nucleic Acids. 2024 Oct 29;35(4):102376. doi: 10.1016/j.omtn.2024.102376. eCollection 2024 Dec 10. Mol Ther Nucleic Acids. 2024. PMID: 39618823 Free PMC article.
Multimodal zero-shot learning of previously unseen epitranscriptomes from RNA-seq data.
Song Y, Song B, Huang D, Nguyen A, Hu L, Meng J, Wang Y. Song Y, et al. Brief Bioinform. 2025 Jul 2;26(4):bbaf332. doi: 10.1093/bib/bbaf332. Brief Bioinform. 2025. PMID: 40632498 Free PMC article.
Interpretability-guided RNA N⁶-methyladenosine modification site prediction with invertible neural networks.
Li G, Su X, Yang Y, Li D, Cui Z, Deng X, Hu P, Hu L. Li G, et al. Commun Biol. 2025 Jul 8;8(1):1022. doi: 10.1038/s42003-025-08265-8. Commun Biol. 2025. PMID: 40629144 Free PMC article.
Statistical modeling of single-cell epitranscriptomics enabled trajectory and regulatory inference of RNA methylation.
Wang H, Wang Y, Zhou J, Song B, Tu G, Nguyen A, Su J, Coenen F, Wei Z, Rigden DJ, Meng J. Wang H, et al. Cell Genom. 2025 Jan 8;5(1):100702. doi: 10.1016/j.xgen.2024.100702. Epub 2024 Dec 5. Cell Genom. 2025. PMID: 39642887 Free PMC article.
Methyl-GP: accurate generic DNA methylation prediction based on a language model and representation learning.
Xie H, Wang L, Qian Y, Ding Y, Guo F. Xie H, et al. Nucleic Acids Res. 2025 Mar 20;53(6):gkaf223. doi: 10.1093/nar/gkaf223. Nucleic Acids Res. 2025. PMID: 40156859 Free PMC article.

References

1. Abbas Z, Tayara H, Zou Q. et al. TS-m6A-DL: tissue-specific identification of N6-methyladenosine sites using a universal deep learning model. Comput Struct Biotechnol J 2021;19:4619–25. - PMC - PubMed
1. Bailey TL. DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 2011;27:1653–9. - PMC - PubMed
1. Bansal H, Yihua Q, Iyer SP. et al. WTAP is a novel oncogenic protein in acute myeloid leukemia. Leukemia 2014;28:1171–4. - PMC - PubMed
1. Boccaletto P, Machnicka MA, Purta E. et al. MODOMICS: a database of RNA modification pathways. 2017 update. Nucleic Acids Res 2017;46:D303–7. - PMC - PubMed
1. Cai J, Yang F, Zhan H. et al. RNA m(6)a methyltransferase METTL3 promotes the growth of prostate cancer by regulating hedgehog pathway. Onco Targets Ther 2019;12:9143–52. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Grants and funding

R01 AI111965/AI/NIAID NIH HHS/United States

LinkOut - more resources

Full Text Sources

[1] Abbas Z, Tayara H, Zou Q. et al. TS-m6A-DL: tissue-specific identification of N6-methyladenosine sites using a universal deep learning model. Comput Struct Biotechnol J 2021;19:4619–25. - PMC - PubMed

[2] Abbas Z, Tayara H, Zou Q. et al. TS-m6A-DL: tissue-specific identification of N6-methyladenosine sites using a universal deep learning model. Comput Struct Biotechnol J 2021;19:4619–25. - PMC - PubMed

[3] Bailey TL. DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 2011;27:1653–9. - PMC - PubMed

[4] Bailey TL. DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 2011;27:1653–9. - PMC - PubMed

[5] Bansal H, Yihua Q, Iyer SP. et al. WTAP is a novel oncogenic protein in acute myeloid leukemia. Leukemia 2014;28:1171–4. - PMC - PubMed

[6] Bansal H, Yihua Q, Iyer SP. et al. WTAP is a novel oncogenic protein in acute myeloid leukemia. Leukemia 2014;28:1171–4. - PMC - PubMed

[7] Boccaletto P, Machnicka MA, Purta E. et al. MODOMICS: a database of RNA modification pathways. 2017 update. Nucleic Acids Res 2017;46:D303–7. - PMC - PubMed

[8] Boccaletto P, Machnicka MA, Purta E. et al. MODOMICS: a database of RNA modification pathways. 2017 update. Nucleic Acids Res 2017;46:D303–7. - PMC - PubMed

[9] Cai J, Yang F, Zhan H. et al. RNA m(6)a methyltransferase METTL3 promotes the growth of prostate cancer by regulating hedgehog pathway. Onco Targets Ther 2019;12:9143–52. - PMC - PubMed

[10] Cai J, Yang F, Zhan H. et al. RNA m(6)a methyltransferase METTL3 promotes the growth of prostate cancer by regulating hedgehog pathway. Onco Targets Ther 2019;12:9143–52. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Interpretable prediction models for widespread m6A RNA modification across cell lines and tissues

Affiliations

Interpretable prediction models for widespread m6A RNA modification across cell lines and tissues

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources