Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jun 9:11:545.
doi: 10.3389/fgene.2020.00545. eCollection 2020.

LITHOPHONE: Improving lncRNA Methylation Site Prediction Using an Ensemble Predictor

Affiliations

LITHOPHONE: Improving lncRNA Methylation Site Prediction Using an Ensemble Predictor

Lian Liu et al. Front Genet. .

Abstract

N 6-methyladenosine (m6A) is one of the most widely studied epigenetic modifications, which plays an important role in many biological processes, such as splicing, RNA localization, and degradation. Studies have shown that m6A on lncRNA has important functions, including regulating the expression and functions of lncRNA, regulating the synthesis of pre-mRNA, promoting the proliferation of cancer cells, and affecting cell differentiation and many others. Although a number of methods have been proposed to predict m6A RNA methylation sites, most of these methods aimed at general m6A sites prediction without noticing the uniqueness of the lncRNA methylation prediction problem. Since many lncRNAs do not have a polyA tail and cannot be captured in the polyA selection step of the most widely adopted RNA-seq library preparation protocol, lncRNA methylation sites cannot be effectively captured and are thus likely to be significantly underrepresented in existing experimental data affecting the accuracy of existing predictors. In this paper, we propose a new computational framework, LITHOPHONE, which stands for long noncoding RNA methylation sites prediction from sequence characteristics and genomic information with an ensemble predictor. We show that the methylation sites of lncRNA and mRNA have different patterns exhibited in the extracted features and should be differently handled when making predictions. Due to the used experiment protocols, the number of known lncRNA m6A sites is limited, and insufficient to train a reliable predictor; thus, the performance can be improved by combining both lncRNA and mRNA data using an ensemble predictor. We show that the newly developed LITHOPHONE approach achieved a reasonably good performance when tested on independent datasets (AUC: 0.966 and 0.835 under full transcript and mature mRNA modes, respectively), marking a substantial improvement compared with existing methods. Additionally, LITHOPHONE was applied to scan the entire human lncRNAome for all possible lncRNA m6A sites, and the results are freely accessible at: http://180.208.58.19/lith/.

Keywords: ensemble model; epitranscriptome; lncRNA; m6A; site prediction.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Search for optimal parameter of the ensemble predictor. The optimal result was achieved when α = 0.3. When α = 0, only lncRNA sites were used for training; while when α = 1, only mRNA sites were considered.
Figure 2
Figure 2
Feature selection results. (A) The ranking of the features for full transcript m6A site prediction. (B) The ranking of the features for mature lncRNA m6A site prediction. (C) Top 134 features were selected for full transcript m6A site prediction. (D) Top 41 features were selected for mature lncRNA m6A site prediction.
Figure 3
Figure 3
ROC for lncRNA methylation site prediction. The proposed approach substantially outperformed competing approaches. (A) The ROC curve for the full transcript mode. (B) The ROC curve for the mature RNA mode.

Similar articles

Cited by

References

    1. Alarcón C. R., Hyeseung L., Hani G., Nils H., Tavazoie S. F. (2015a). N6-methyladenosine marks primary microRNAs for processing. Nature 519, 482–485. 10.1038/nature14281 - DOI - PMC - PubMed
    1. Alarcón C. R., Lee H., Goodarzi H., Halberg N., Tavazoie S. F. (2015b). N6-methyladenosine marks primary microRNAs for processing. Nature 519, 482–485. 10.1038/nature14281 - DOI - PMC - PubMed
    1. Bastian L., Grozhik A. V., Olarerin-George A. O., Cem M., Mason C. E., Jaffrey S. R. (2015). Single-nucleotide resolution mapping of m6A and m6Am throughout the transcriptome. Nat. Methods 12:767. 10.1038/nmeth.3453 - DOI - PMC - PubMed
    1. Cha S., Yu H., Park A. Y., Oh S. A., Kim J. Y. (2015). The obesity-risk variant of FTO is inversely related with the So-Eum constitutional type: genome-wide association and replication analyses. Bmc Complement. Alternative Med. 15:120. 10.1186/s12906-015-0609-4 - DOI - PMC - PubMed
    1. Chen T., Guestrin C. (2016). XGBoost: A Scalable Tree Boosting System, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco).

LinkOut - more resources