LITHOPHONE: Improving lncRNA Methylation Site Prediction Using an Ensemble Predictor
- PMID: 32582286
- PMCID: PMC7297269
- DOI: 10.3389/fgene.2020.00545
LITHOPHONE: Improving lncRNA Methylation Site Prediction Using an Ensemble Predictor
Abstract
N 6-methyladenosine (m6A) is one of the most widely studied epigenetic modifications, which plays an important role in many biological processes, such as splicing, RNA localization, and degradation. Studies have shown that m6A on lncRNA has important functions, including regulating the expression and functions of lncRNA, regulating the synthesis of pre-mRNA, promoting the proliferation of cancer cells, and affecting cell differentiation and many others. Although a number of methods have been proposed to predict m6A RNA methylation sites, most of these methods aimed at general m6A sites prediction without noticing the uniqueness of the lncRNA methylation prediction problem. Since many lncRNAs do not have a polyA tail and cannot be captured in the polyA selection step of the most widely adopted RNA-seq library preparation protocol, lncRNA methylation sites cannot be effectively captured and are thus likely to be significantly underrepresented in existing experimental data affecting the accuracy of existing predictors. In this paper, we propose a new computational framework, LITHOPHONE, which stands for long noncoding RNA methylation sites prediction from sequence characteristics and genomic information with an ensemble predictor. We show that the methylation sites of lncRNA and mRNA have different patterns exhibited in the extracted features and should be differently handled when making predictions. Due to the used experiment protocols, the number of known lncRNA m6A sites is limited, and insufficient to train a reliable predictor; thus, the performance can be improved by combining both lncRNA and mRNA data using an ensemble predictor. We show that the newly developed LITHOPHONE approach achieved a reasonably good performance when tested on independent datasets (AUC: 0.966 and 0.835 under full transcript and mature mRNA modes, respectively), marking a substantial improvement compared with existing methods. Additionally, LITHOPHONE was applied to scan the entire human lncRNAome for all possible lncRNA m6A sites, and the results are freely accessible at: http://180.208.58.19/lith/.
Keywords: ensemble model; epitranscriptome; lncRNA; m6A; site prediction.
Copyright © 2020 Liu, Lei, Fang, Tang, Meng and Wei.
Figures



Similar articles
-
WITMSG: Large-scale Prediction of Human Intronic m6A RNA Methylation Sites from Sequence and Genomic Features.Curr Genomics. 2020 Jan;21(1):67-76. doi: 10.2174/1389202921666200211104140. Curr Genomics. 2020. PMID: 32655300 Free PMC article.
-
EDLm6APred: ensemble deep learning approach for mRNA m6A site prediction.BMC Bioinformatics. 2021 May 29;22(1):288. doi: 10.1186/s12859-021-04206-4. BMC Bioinformatics. 2021. PMID: 34051729 Free PMC article.
-
WHISTLE: a high-accuracy map of the human N6-methyladenosine (m6A) epitranscriptome predicted using a machine learning approach.Nucleic Acids Res. 2019 Apr 23;47(7):e41. doi: 10.1093/nar/gkz074. Nucleic Acids Res. 2019. PMID: 30993345 Free PMC article.
-
Novel insight into the functions of N6‑methyladenosine modified lncRNAs in cancers (Review).Int J Oncol. 2022 Dec;61(6):152. doi: 10.3892/ijo.2022.5442. Epub 2022 Oct 20. Int J Oncol. 2022. PMID: 36263625 Free PMC article. Review.
-
LncRNAs and Chromatin Modifications Pattern m6A Methylation at the Untranslated Regions of mRNAs.Front Genet. 2022 Mar 17;13:866772. doi: 10.3389/fgene.2022.866772. eCollection 2022. Front Genet. 2022. PMID: 35368653 Free PMC article. Review.
Cited by
-
Recent advances in functional annotation and prediction of the epitranscriptome.Comput Struct Biotechnol J. 2021 May 21;19:3015-3026. doi: 10.1016/j.csbj.2021.05.030. eCollection 2021. Comput Struct Biotechnol J. 2021. PMID: 34136099 Free PMC article. Review.
-
PSI-MOUSE: Predicting Mouse Pseudouridine Sites From Sequence and Genome-Derived Features.Evol Bioinform Online. 2020 Jun 9;16:1176934320925752. doi: 10.1177/1176934320925752. eCollection 2020. Evol Bioinform Online. 2020. PMID: 32565674 Free PMC article.
-
Construction of Prognostic Risk Model of 5-Methylcytosine-Related Long Non-Coding RNAs and Evaluation of the Characteristics of Tumor-Infiltrating Immune Cells in Breast Cancer.Front Genet. 2021 Oct 29;12:748279. doi: 10.3389/fgene.2021.748279. eCollection 2021. Front Genet. 2021. PMID: 34777473 Free PMC article.
-
Prediction of RNA Methylation Status From Gene Expression Data Using Classification and Regression Methods.Evol Bioinform Online. 2020 Jul 20;16:1176934320915707. doi: 10.1177/1176934320915707. eCollection 2020. Evol Bioinform Online. 2020. PMID: 32733123 Free PMC article.
-
The integration of single-cell sequencing, TCGA, and GEO data analysis revealed that PRRT3-AS1 is a biomarker and therapeutic target of SKCM.Front Immunol. 2022 Sep 23;13:919145. doi: 10.3389/fimmu.2022.919145. eCollection 2022. Front Immunol. 2022. PMID: 36211371 Free PMC article.
References
-
- Chen T., Guestrin C. (2016). XGBoost: A Scalable Tree Boosting System, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco).
LinkOut - more resources
Full Text Sources