Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 5;51(W1):W281-W288.
doi: 10.1093/nar/gkad381.

IRSOM2: a web server for predicting bifunctional RNAs

Affiliations

IRSOM2: a web server for predicting bifunctional RNAs

Guillaume Postic et al. Nucleic Acids Res. .

Abstract

Recent advances have shown that some biologically active non-coding RNAs (ncRNAs) are actually translated into polypeptides that have a physiological function as well. This paradigm shift requires adapted computational methods to predict this new class of 'bifunctional RNAs'. Previously, we developed IRSOM, an open-source algorithm to classify non-coding and coding RNAs. Here, we use the binary statistical model of IRSOM as a ternary classifier, called IRSOM2, to identify bifunctional RNAs as a rejection of the two other classes. We present its easy-to-use web interface, which allows users to perform predictions on large datasets of RNA sequences in a short time, to re-train the model with their own data, and to visualize and analyze the classification results thanks to the implementation of self-organizing maps (SOM). We also propose a new benchmark of experimentally validated RNAs that play both protein-coding and non-coding roles, in different organisms. Thus, IRSOM2 showed promising performance in detecting these bifunctional transcripts among ncRNAs of different types, such as circRNAs and lncRNAs (in particular those of shorter lengths). The web server is freely available on the EvryRNA platform: https://evryrna.ibisc.univ-evry.fr.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
Graphical output of the IRSOM2 web server. (A) Representation of the resulting self-organizing map, as a grid of 10 × 10 neurons. (B) For each input sequence, the table gives the map coordinates (x,y) of the cluster (neuron) to which it belongs (the Best Matching Unit, ‘BMU’ column) along with its predicted class and the two associated probabilities. (C) Bar plots that display how the probabilities of being coding or noncoding are distributed among the three RNA categories. (D, E) Optional plots of the feature profiles: ORF and codon bias, respectively. The coordinates of the 10 × 10 box plots correspond to those of the neurons in the grid.
Figure 2.
Figure 2.
Performance of IRSOM2 on the experimentally validated SPENCER dataset of cancer-specific bifunctional RNAs (n = 733). (A) Pie chart representing the predictions of IRSOM2 on the bifunctional RNAs from SPENCER. (B) True positive rate (TPR) depending on the sequence length. The TPR is represented in blue. The green surface represents the cumulative distribution of sequence lengths. The red line represents the 33.3% TPR of a random ternary classifier. (C, D) ‘Zoomed views’ representing the subsets of sequences shorter than 5000 nt (n = 707) and 500 nt (n = 119), respectively. (E) Barplot representing the performance of IRSOM2 on different subsets of SPENCER dataset, which correspond to different ranges of RNA sequence lengths (values in nt). The term ‘macroRNAs’ designates lncRNAs longer than 10 kb. The size of each subset is given below the intervals.

References

    1. Guo J.-C., Fang S.-S., Wu Y., Zhang J.-H., Chen Y., Liu J., Wu B., Wu J.-R., Li E.-M., Xu L.-Y.et al. .. CNIT: a fast and accurate web tool for identifying protein-coding and long non-coding transcripts based on intrinsic sequence composition. Nucleic Acids Res. 2019; 47:W516–W522. - PMC - PubMed
    1. Tong X., Liu S.. CPPred: coding potential prediction based on the global description of RNA sequence. Nucleic Acids Res. 2019; 47:e43. - PMC - PubMed
    1. Kang Y.-J., Yang D.-C., Kong L., Hou M., Meng Y.-Q., Wei L., Gao G.. CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res. 2017; 45:W12–W16. - PMC - PubMed
    1. Li A., Zhang J., Zhou Z.. PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme. BMC Bioinf. 2014; 15:311. - PMC - PubMed
    1. Wang L., Park H.J., Dasari S., Wang S., Kocher J.-P., Li W.. CPAT: coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 2013; 41:e74. - PMC - PubMed