CircSI-SSL: circRNA-binding site identification based on self-supervised learning
- PMID: 38180876
- PMCID: PMC10789309
- DOI: 10.1093/bioinformatics/btae004
CircSI-SSL: circRNA-binding site identification based on self-supervised learning
Abstract
Motivation: In recent years, circular RNAs (circRNAs), the particular form of RNA with a closed-loop structure, have attracted widespread attention due to their physiological significance (they can directly bind proteins), leading to the development of numerous protein site identification algorithms. Unfortunately, these studies are supervised and require the vast majority of labeled samples in training to produce superior performance. But the acquisition of sample labels requires a large number of biological experiments and is difficult to obtain.
Results: To resolve this matter that a great deal of tags need to be trained in the circRNA-binding site prediction task, a self-supervised learning binding site identification algorithm named CircSI-SSL is proposed in this article. According to the survey, this is unprecedented in the research field. Specifically, CircSI-SSL initially combines multiple feature coding schemes and employs RNA_Transformer for cross-view sequence prediction (self-supervised task) to learn mutual information from the multi-view data, and then fine-tuning with only a few sample labels. Comprehensive experiments on six widely used circRNA datasets indicate that our CircSI-SSL algorithm achieves excellent performance in comparison to previous algorithms, even in the extreme case where the ratio of training data to test data is 1:9. In addition, the transplantation experiment of six linRNA datasets without network modification and hyperparameter adjustment shows that CircSI-SSL has good scalability. In summary, the prediction algorithm based on self-supervised learning proposed in this article is expected to replace previous supervised algorithms and has more extensive application value.
Availability and implementation: The source code and data are available at https://github.com/cc646201081/CircSI-SSL.
© The Author(s) 2024. Published by Oxford University Press.
Conflict of interest statement
None declared.
Figures








Similar articles
-
Dive into the details of self-supervised learning for medical image analysis.Med Image Anal. 2023 Oct;89:102879. doi: 10.1016/j.media.2023.102879. Epub 2023 Jun 30. Med Image Anal. 2023. PMID: 37453236
-
circRNA-binding protein site prediction based on multi-view deep learning, subspace learning and multi-view classifier.Brief Bioinform. 2022 Jan 17;23(1):bbab394. doi: 10.1093/bib/bbab394. Brief Bioinform. 2022. PMID: 34571539
-
CircSSNN: circRNA-binding site prediction via sequence self-attention neural networks with pre-normalization.BMC Bioinformatics. 2023 May 30;24(1):220. doi: 10.1186/s12859-023-05352-7. BMC Bioinformatics. 2023. PMID: 37254080 Free PMC article.
-
Survey on Self-Supervised Learning: Auxiliary Pretext Tasks and Contrastive Learning Methods in Imaging.Entropy (Basel). 2022 Apr 14;24(4):551. doi: 10.3390/e24040551. Entropy (Basel). 2022. PMID: 35455214 Free PMC article. Review.
-
Unsupervised and semi-supervised learning: the next frontier in machine learning for plant systems biology.Plant J. 2022 Sep;111(6):1527-1538. doi: 10.1111/tpj.15905. Epub 2022 Jul 27. Plant J. 2022. PMID: 35821601 Review.
Cited by
-
MGCNSS: miRNA-disease association prediction with multi-layer graph convolution and distance-based negative sample selection strategy.Brief Bioinform. 2024 Mar 27;25(3):bbae168. doi: 10.1093/bib/bbae168. Brief Bioinform. 2024. PMID: 38622356 Free PMC article.
-
An Integrated TCN-CrossMHA Model for Predicting circRNA-RBP Binding Sites.Interdiscip Sci. 2025 Mar;17(1):86-100. doi: 10.1007/s12539-024-00660-9. Epub 2024 Nov 6. Interdiscip Sci. 2025. PMID: 39503827
-
RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models.Heliyon. 2025 Jan 6;11(2):e41488. doi: 10.1016/j.heliyon.2024.e41488. eCollection 2025 Jan 30. Heliyon. 2025. PMID: 39897847 Free PMC article. Review.
-
DGCLCMI: a deep graph collaboration learning method to predict circRNA-miRNA interactions.BMC Biol. 2025 Apr 23;23(1):104. doi: 10.1186/s12915-025-02197-9. BMC Biol. 2025. PMID: 40264118 Free PMC article.
-
CRBPSA: CircRNA-RBP interaction sites identification using sequence structural attention model.BMC Biol. 2024 Nov 14;22(1):260. doi: 10.1186/s12915-024-02055-0. BMC Biol. 2024. PMID: 39543602 Free PMC article.
References
-
- Alipanahi B, Delong A, Weirauch MT. et al. Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat Biotechnol 2015;33:831–8. - PubMed
-
- Bogard B, Francastel C, Hubé F.. A new method for the identification of thousands of circular RNAs. Non-Coding RNA Investig 2018;2:5.
-
- Chen L-L. The biogenesis and emerging roles of circular RNAs. Nat Rev Mol Cell Biol 2016;17:205–11. - PubMed
-
- Chen T, Kornblith S, Norouzi M. et al. A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning. Vienna, Austria, 13 July, 2020. 1597–1607. PMLR, 2020.