. 2024 Jan 2:13:giae022.

doi: 10.1093/gigascience/giae022.

CheRRI-Accurate classification of the biological relevance of putative RNA-RNA interaction sites

Teresa Müller¹, Stefan Mautner¹, Pavankumar Videm¹, Florian Eggenhofer¹, Martin Raden¹, Rolf Backofen^{1

2}

Affiliations

¹ Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany.
² Signalling Research Centre CIBSS, University of Freiburg, Schaenzlestr. 18, 79104 Freiburg, Germany.

PMID: 38837942
PMCID: PMC11152173
DOI: 10.1093/gigascience/giae022

CheRRI-Accurate classification of the biological relevance of putative RNA-RNA interaction sites

Teresa Müller et al. Gigascience. 2024.

. 2024 Jan 2:13:giae022.

doi: 10.1093/gigascience/giae022.

Authors

Teresa Müller¹, Stefan Mautner¹, Pavankumar Videm¹, Florian Eggenhofer¹, Martin Raden¹, Rolf Backofen^{1

2}

Affiliations

¹ Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany.
² Signalling Research Centre CIBSS, University of Freiburg, Schaenzlestr. 18, 79104 Freiburg, Germany.

PMID: 38837942
PMCID: PMC11152173
DOI: 10.1093/gigascience/giae022

Abstract

Background: RNA-RNA interactions are key to a wide range of cellular functions. The detection of potential interactions helps to understand the underlying processes. However, potential interactions identified via in silico or experimental high-throughput methods can lack precision because of a high false-positive rate.

Results: We present CheRRI, the first tool to evaluate the biological relevance of putative RNA-RNA interaction sites. CheRRI filters candidates via a machine learning-based model trained on experimental RNA-RNA interactome data. Its unique setup combines interactome data and an established thermodynamic prediction tool to integrate experimental data with state-of-the-art computational models. Applying these data to an automated machine learning approach provides the opportunity to not only filter data for potential false positives but also tailor the underlying interaction site model to specific needs.

Conclusions: CheRRI is a stand-alone postprocessing tool to filter either predicted or experimentally identified potential RNA-RNA interactions on a genomic level to enhance the quality of interaction candidates. It is easy to install (via conda, pip packages), use (via Galaxy), and integrate into existing RNA-RNA interaction pipelines.

Keywords: RNA–RNA interactome; classification; direct duplex detection; false positives; functional RRI.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Figure 1:**
Graphical abstract. CheRRI takes RRI sites (yellow) as input and adds genomic context up- and downstream (gray). Inside these extended RRI sites, RRI predictions (black, red) are computed by IntaRNA. The RRI prediction can exceed the original RRI site (fine black), but the seed (red) needs to be within the RRI site. CheRRI then extracts various sequence, context, and graph features (orange). These features are used to train a predictive model (train mode) or to evaluate whether a given RRI site is biologically relevant (eval mode).

**Figure 2:**
The CheRRI workflow. Except for the initial data-processing step, both the model selection step in train mode (left) and the classification step in eval mode (right) use the same core modules. In detail, the train mode takes DDD data as input while a tabular file containing RRI sites to be evaluated is provided in the eval mode. A reference genome can be automatically downloaded (human and mouse) or needs to be provided for both modes. Optionally, RBP data can be provided as well. After extracting the sequences with context, CheRRI uses IntaRNA to predict interactions anchored within the sites. Then various features are extracted from the predicted RRIs as well as sequence, context, and accessibility information. These features are then used to either build an organism-specific classification model (in train mode) or to evaluate (in eval mode) the given RRIs with such a model.

**Figure 3:**
Generating interaction details for reliable RRI sites. Starting with the subsequences defining an RRI site (top blue and green) detected by a DDD method, the sequence is first extended with genomic context (gray). In these extended sequences, all regions known to be occupied (e.g., by interaction with other RNAs or proteins) are masked as occupied regions (pink boxes). To create interaction details for reliable interactions (positive data), IntaRNA predictions are required to show a seed (red base pairs) within the original RRI site (orange box) while avoiding masked regions. For negative interactions, also the RRI site is masked as occupied and “flanking” interactions from the RRI site’s genomic context are predicted. In both cases, the top 5 ranked IntaRNA predictions are subsequently taken into account to compile the features of an RRI site. The number of suboptimals can be changed by the user.

**Figure 4:**
Precision-recall plot. Comparing CheRRI’s models for human and mouse (yellow and green line) against interaction site classification only based on minimal IntaRNA energy scores (E-based classification, orange [human] and blue line [mouse]). The dark gray line shows the baseline (e.g., how many predictions are expected to occur by chance) for the human data-based models and the lighter gray line for the mouse data-based models.

**Figure 5:**
Evaluation of models. The model performance is measured by the F1 score, using different validation datasets. The figure compares the “Human”, “Human + RBP”, and “Mouse” datasets. On the left side (A) without graph-kernel models and (B) on the right side, including graph-kernel features. The model validation was performed using 5-fold cross-validation. All training data not used for a particular model training were used for validation (e.g., human model validated on mouse data).

See this image and copyright information in PMC

References

1. Guil S, Esteller M. RNA–RNA interactions in gene regulation: the coding and noncoding players. Trends Biochem Sci. 2015;40(5):248–56.. 10.1016/j.tibs.2015.03.001. - DOI - PubMed
1. Bunch H. Gene regulation of mammalian long non-coding RNA. Mol Genet Genomics. 2018;293(1):1–15.. 10.1007/s00438-017-1370-9. - DOI - PubMed
1. Pu M, Chen J, Tao Z et al. Regulatory network of miRNA on its target: coordination between transcriptional and post-transcriptional regulation of gene expression. Cell Mol Life Sci. 2019;76(3):441–51.. 10.1007/s00018-018-2940-7. - DOI - PMC - PubMed
1. Hör J, Gorski SA, Vogel J. Bacterial RNA biology on a genome scale. Mol Cell. 2018;70(5):785–99.. 10.1016/j.molcel.2017.12.023. - DOI - PubMed
1. Desgranges E, Caldelari I, Marzi S, et al. Navigation through the twists and turns of RNA sequencing technologies: application to bacterial regulatory RNAs. Biochim Biophys Acta Gene Regul Mech. 2020;1863(3):194506. 10.1016/j.bbagrm.2020.194506. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

Deutsche Forschungsgemeinschaft

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

CheRRI-Accurate classification of the biological relevance of putative RNA-RNA interaction sites

Affiliations

CheRRI-Accurate classification of the biological relevance of putative RNA-RNA interaction sites

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources