Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov 12:18:3528-3538.
doi: 10.1016/j.csbj.2020.10.032. eCollection 2020.

Accurate prediction of RNA 5-hydroxymethylcytosine modification by utilizing novel position-specific gapped k-mer descriptors

Affiliations

Accurate prediction of RNA 5-hydroxymethylcytosine modification by utilizing novel position-specific gapped k-mer descriptors

Sajid Ahmed et al. Comput Struct Biotechnol J. .

Abstract

RNA modification is an essential step towards generation of new RNA structures. Such modification is potentially able to modify RNA function or its stability. Among different modifications, 5-Hydroxymethylcytosine (5hmC) modification of RNA exhibit significant potential for a series of biological processes. Understanding the distribution of 5hmC in RNA is essential to determine its biological functionality. Although conventional sequencing techniques allow broad identification of 5hmC, they are both time-consuming and resource-intensive. In this study, we propose a new computational tool called iRNA5hmC-PS to tackle this problem. To build iRNA5hmC-PS we extract a set of novel sequence-based features called Position-Specific Gapped k-mer (PSG k-mer) to obtain maximum sequential information. Our feature analysis shows that our proposed PSG k-mer features contain vital information for the identification of 5hmC sites. We also use a group-wise feature importance calculation strategy to select a small subset of features containing maximum discriminative information. Our experimental results demonstrate that iRNA5hmC-PS is able to enhance the prediction performance, dramatically. iRNA5hmC-PS achieves 78.3% prediction performance, which is 12.8% better than those reported in the previous studies. iRNA5hmC-PS is publicly available as an online tool at http://103.109.52.8:81/iRNA5hmC-PS. Its benchmark dataset, source codes, and documentation are available at https://github.com/zahid6454/iRNA5hmC-PS.

Keywords: Logistic regression; Position-specific gapped k-mer; Position-specific k-mer; RNA 5-hydroxymethylcytosine modification; Sequence-based feature.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
System diagram of iRNA5hmC-PS. As shown in this figure, we first select our training and test sets. We then extract our proposed features, train our model, evaluate the model’s generalization capability, and finally deploy the trained model for identifying RNA 5hmC sites. Note that “Feature Selector 1”, “Feature Selector 2”, “Feature Selector 3”, and “Feature Selector 4” refer to four independent Random Forest classification models that select most discriminative features from four different feature groups (Position-Specific k-mer, PsM(G)D, PsD(G)M, and PsM(G)M(G)M), respectively.
Fig. 2
Fig. 2
Feature importance of individual feature types.
Fig. 3
Fig. 3
Performance evaluation of LR, SVM, and GNB using 5-fold cross-validation.
Fig. 4
Fig. 4
Performance evaluation of LR, SVM, and GNB using the independent test set.
Fig. 5
Fig. 5
Performance comparison of iRNA5hmC-PS with iRNA5hmC achieved using 5-fold cross-validation. Note that we multiply auROC, auPR, and MCC by 100 to represent them on the same scale as the other evaluation measurements and the value of these measures have been provided along the y-axis (0–100).

References

    1. Cohn W.E., Volkin E. Nucleoside-5′-phosphates from ribonucleic acid. Nature. 1951 doi: 10.1038/167483a0. - DOI
    1. Nachtergaele S., He C. Chemical modifications in the life of an mRNA transcript. Annu Rev Genet. 2018 doi: 10.1146/annurev-genet-120417-031522. - DOI - PMC - PubMed
    1. Boccaletto P. MODOMICS: A database of RNA modification pathways. 2017 update. Nucleic Acids Res. 2017;2018 doi: 10.1093/nar/gkx1030. - DOI - PMC - PubMed
    1. Delaunay S., Frye M. RNA modifications regulating cell fate in cancer. Nat Cell Biol. 2019 doi: 10.1038/s41556-019-0319-0. - DOI - PubMed
    1. Jonkhout N., Tran J., Smith M.A., Schonrock N., Mattick J.S., Novoa E.M. The RNA modification landscape in human disease. RNA. 2017 doi: 10.1261/rna.063503.117. - DOI - PMC - PubMed

LinkOut - more resources