Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 1;39(12):btad709.
doi: 10.1093/bioinformatics/btad709.

Interpretable prediction models for widespread m6A RNA modification across cell lines and tissues

Affiliations

Interpretable prediction models for widespread m6A RNA modification across cell lines and tissues

Ying Zhang et al. Bioinformatics. .

Abstract

Motivation: RNA N6-methyladenosine (m6A) in Homo sapiens plays vital roles in a variety of biological functions. Precise identification of m6A modifications is thus essential to elucidation of their biological functions and underlying molecular-level mechanisms. Currently available high-throughput single-nucleotide-resolution m6A modification data considerably accelerated the identification of RNA modification sites through the development of data-driven computational methods. Nevertheless, existing methods have limitations in terms of the coverage of single-nucleotide-resolution cell lines and have poor capability in model interpretations, thereby having limited applicability.

Results: In this study, we present CLSM6A, comprising a set of deep learning-based models designed for predicting single-nucleotide-resolution m6A RNA modification sites across eight different cell lines and three tissues. Extensive benchmarking experiments are conducted on well-curated datasets and accordingly, CLSM6A achieves superior performance than current state-of-the-art methods. Furthermore, CLSM6A is capable of interpreting the prediction decision-making process by excavating critical motifs activated by filters and pinpointing highly concerned positions in both forward and backward propagations. CLSM6A exhibits better portability on similar cross-cell line/tissue datasets, reveals a strong association between highly activated motifs and high-impact motifs, and demonstrates complementary attributes of different interpretation strategies.

Availability and implementation: The webserver is available at http://csbio.njust.edu.cn/bioinf/clsm6a. The datasets and code are available at https://github.com/zhangying-njust/CLSM6A/.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Overview of the proposed CLSM6A framework. (A) Collection and processing of m6A RNA modification data. (B) Structure of the proposed CLSM6A in this study. (C) Model-based and propagation-based interpretation.
Figure 2.
Figure 2.
CLSM6A enables the single-nucleotide-resolution prediction of m6A sites. (A) Prediction performance on the three tissue testing sets in terms of area under the receiver-operating characteristic curve (AUC), the area under the precision–recall curve (AP), Matthew’s correlation coefficient (MCC), accuracy (Acc), specificity (Spe), and sensitivity (Sen). Considered were im6A-TS-CNN and TS-m6A-DL trained on the sequence length of 41 (the length of original model) and 201 (the length same as CLSM6A). (B) ROC curves and PR curves of CLSM6A in cell lines and tissues. (C) A Sankey diagram of the number of samples and the predicted results in the testing dataset. (D) The feature space distribution visualization in four cell lines/tissues. (E) Performance of CLSM6A under different length of flanking sequence.
Figure 3.
Figure 3.
Characteristic motifs identified from conventional motif discovery tool DREME can be detected by the first convolutional layer of CLSM6A. For each aligned result, the upper panel is the motif (with the smallest E-value in each cell line) identified by DREME. With the E-values displayed below. The bottom panel is the motif detected by CLSM6A, with the kernel motif number, P-value and number of overlaps provided, and P-value was calculated using TOMTOM by utilizing a null model containing CLSM6A’s motif columns from the top column in the set of DREME motifs.
Figure 4.
Figure 4.
Illustration of the relationships between datasets and validation across lines and tissues. (A) Frequency-based clustermap of different cell lines and tissues, and the sequence motifs plotted by SeqLogo. (B) Venn diagram of modifications sharing the same exon. (C) Heatmap of cross-species validation with the used datasets on the x-axis and the models on the y-axis; a cell line/tissue specific model (in columns) was well-trained on its own training data and validated on the independent data on the cell line/tissue specific in rows. (D, E) The feature space distribution of activated motifs in cell lines/tissues colored by cell type.
Figure 5.
Figure 5.
Analysis of motifs learned by CLSM6A. (A) Visualization of the motif distribution in the liver cell line, with the impact of filter on the x-axis and the average activated amount sequences on the y-axis. Motif filters with high impact or high activated amounts are displayed. (B) Correlation among motifs. Pairwise PCCs between motifs within the same cluster exhibit high values. The highly activated and high-impact motifs are marked in red and green, respectively. (C) Pairwise PCCs between top highly activated motifs and important motifs in 11 cell lines.
Figure 6.
Figure 6.
Model-based interpretation and the propagation-based interpretation on the tissue of liver brain and kidney. (A) Global exhibition of positions the model focuses on via three strategies, with solid lines recording the averaged values and the light background ranging from the smallest value to the largest value in each position. (B) Local (single input example) attribution similarity by the two propagation-based interpretation methods. (C) Local exhibition (single input example) of activated subsequences by different filters. (D) Exhibition (single input example) of attribution vectors reported by propagation-based interpretations.

Similar articles

Cited by

References

    1. Abbas Z, Tayara H, Zou Q. et al. TS-m6A-DL: tissue-specific identification of N6-methyladenosine sites using a universal deep learning model. Comput Struct Biotechnol J 2021;19:4619–25. - PMC - PubMed
    1. Bailey TL. DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 2011;27:1653–9. - PMC - PubMed
    1. Bansal H, Yihua Q, Iyer SP. et al. WTAP is a novel oncogenic protein in acute myeloid leukemia. Leukemia 2014;28:1171–4. - PMC - PubMed
    1. Boccaletto P, Machnicka MA, Purta E. et al. MODOMICS: a database of RNA modification pathways. 2017 update. Nucleic Acids Res 2017;46:D303–7. - PMC - PubMed
    1. Cai J, Yang F, Zhan H. et al. RNA m(6)a methyltransferase METTL3 promotes the growth of prostate cancer by regulating hedgehog pathway. Onco Targets Ther 2019;12:9143–52. - PMC - PubMed

Publication types