. 2024 Jan 2;40(1):btae004.

doi: 10.1093/bioinformatics/btae004.

CircSI-SSL: circRNA-binding site identification based on self-supervised learning

Chao Cao^{1

2}, Chunyu Wang³, Shuhong Yang⁴, Quan Zou^{1

2}

Affiliations

¹ Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324003, China.
² Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China.
³ Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.
⁴ Faculty of Mathematics and Computer Science, Guangdong Ocean University, Zhanjiang, Guangdong 524088, China.

PMID: 38180876
PMCID: PMC10789309
DOI: 10.1093/bioinformatics/btae004

CircSI-SSL: circRNA-binding site identification based on self-supervised learning

Chao Cao et al. Bioinformatics. 2024.

. 2024 Jan 2;40(1):btae004.

doi: 10.1093/bioinformatics/btae004.

Authors

Chao Cao^{1

2}, Chunyu Wang³, Shuhong Yang⁴, Quan Zou^{1

2}

Affiliations

¹ Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324003, China.
² Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China.
³ Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.
⁴ Faculty of Mathematics and Computer Science, Guangdong Ocean University, Zhanjiang, Guangdong 524088, China.

PMID: 38180876
PMCID: PMC10789309
DOI: 10.1093/bioinformatics/btae004

Abstract

Motivation: In recent years, circular RNAs (circRNAs), the particular form of RNA with a closed-loop structure, have attracted widespread attention due to their physiological significance (they can directly bind proteins), leading to the development of numerous protein site identification algorithms. Unfortunately, these studies are supervised and require the vast majority of labeled samples in training to produce superior performance. But the acquisition of sample labels requires a large number of biological experiments and is difficult to obtain.

Results: To resolve this matter that a great deal of tags need to be trained in the circRNA-binding site prediction task, a self-supervised learning binding site identification algorithm named CircSI-SSL is proposed in this article. According to the survey, this is unprecedented in the research field. Specifically, CircSI-SSL initially combines multiple feature coding schemes and employs RNA_Transformer for cross-view sequence prediction (self-supervised task) to learn mutual information from the multi-view data, and then fine-tuning with only a few sample labels. Comprehensive experiments on six widely used circRNA datasets indicate that our CircSI-SSL algorithm achieves excellent performance in comparison to previous algorithms, even in the extreme case where the ratio of training data to test data is 1:9. In addition, the transplantation experiment of six linRNA datasets without network modification and hyperparameter adjustment shows that CircSI-SSL has good scalability. In summary, the prediction algorithm based on self-supervised learning proposed in this article is expected to replace previous supervised algorithms and has more extensive application value.

Availability and implementation: The source code and data are available at https://github.com/cc646201081/CircSI-SSL.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

**Figure 2.**
RNA_Transformer structure.

**Figure 3.**
AUC discrimination performance obtained by eight existing supervised algorithms.

**Figure 4.**
AUC performance obtained by the latest three supervised learning algorithms on six datasets.

**Figure 5.**
Performance comparison between CircSI-SSL and the latest three supervised algorithms in four indicators.

**Figure 6.**
Average AUC performance comparison between CircSI-SSL and the latest three supervised algorithms on six datasets.

**Figure 7.**
Average AUC performance with and without SSL across six datasets.

**Figure 8.**
Comparison of transplant performance on linRNA datasets.

See this image and copyright information in PMC

References

1. Alipanahi B, Delong A, Weirauch MT. et al. Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat Biotechnol 2015;33:831–8. - PubMed
1. Bogard B, Francastel C, Hubé F.. A new method for the identification of thousands of circular RNAs. Non-Coding RNA Investig 2018;2:5.
1. Cao C, Yang S, Li M. et al. CircSSNN: circRNA-binding site prediction via sequence self-attention neural networks with pre-normalization. BMC Bioinformatics 2023;24:220. - PMC - PubMed
1. Chen L-L. The biogenesis and emerging roles of circular RNAs. Nat Rev Mol Cell Biol 2016;17:205–11. - PubMed
1. Chen T, Kornblith S, Norouzi M. et al. A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning. Vienna, Austria, 13 July, 2020. 1597–1607. PMLR, 2020.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

CircSI-SSL: circRNA-binding site identification based on self-supervised learning

Affiliations

CircSI-SSL: circRNA-binding site identification based on self-supervised learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources