Discriminating cirRNAs from other lncRNAs using a hierarchical extreme learning machine (H-ELM) algorithm with feature selection
- PMID: 28913654
- DOI: 10.1007/s00438-017-1372-7
Discriminating cirRNAs from other lncRNAs using a hierarchical extreme learning machine (H-ELM) algorithm with feature selection
Abstract
As non-coding RNAs, circular RNAs (cirRNAs) and long non-coding RNAs (lncRNAs) have attracted an increasing amount of attention. They have been confirmed to participate in many biological processes, including playing roles in transcriptional regulation, regulating protein-coding genes, and binding to RNA-associated proteins. Until now, the differences between these two types of non-coding RNAs have not been fully uncovered. It is still quite difficult to detect cirRNAs from other lncRNAs using simple techniques. In this study, we investigated these two types of non-coding RNAs using several computational methods. The purpose was to extract important factors that could distinguish cirRNAs from other lncRNAs and build an effective classification model to distinguish them. First, we collected cirRNAs, lncRNAs and their representations from a previous study, in which each cirRNA or lncRNA was represented by 188 features derived from its graph representation, sequence and conservation properties. Second, these features were analyzed by the minimum redundancy maximum relevance (mRMR) method. The obtained mRMR feature list, incremental feature selection method and hierarchical extreme learning machine algorithm were employed to build an optimal classification model with sensitivity of 0.703, specificity of 0.850, accuracy of 0.789 and a Matthews correlation coefficient of 0.561. Finally, we analyzed the 16 most important features. Of them, the sequences and structures of the RNA molecule were top ranking, implying they can be potential indicators of differences between cirRNAs and other lncRNAs. Meanwhile, other features of evolutionary conversation, sequence consecution were also important.
Keywords: Hierarchical extreme learning machine algorithm; Minimum redundancy maximum relevance; cirRNAs; lncRNAs.
Similar articles
-
CRlncRC: a machine learning-based method for cancer-related long noncoding RNA identification using integrated features.BMC Med Genomics. 2018 Dec 31;11(Suppl 6):120. doi: 10.1186/s12920-018-0436-9. BMC Med Genomics. 2018. PMID: 30598114 Free PMC article.
-
A Support Vector Machine based method to distinguish long non-coding RNAs from protein coding transcripts.BMC Genomics. 2017 Oct 18;18(1):804. doi: 10.1186/s12864-017-4178-4. BMC Genomics. 2017. PMID: 29047334 Free PMC article.
-
Machine Learning-Based Annotation of Long Noncoding RNAs Using PLncPRO.Methods Mol Biol. 2020;2107:253-260. doi: 10.1007/978-1-0716-0235-5_12. Methods Mol Biol. 2020. PMID: 31893451
-
An Overview of Circular RNAs.Adv Exp Med Biol. 2018;1087:3-14. doi: 10.1007/978-981-13-1426-1_1. Adv Exp Med Biol. 2018. PMID: 30259353 Review.
-
Long non-coding RNAs and complex diseases: from experimental results to computational models.Brief Bioinform. 2017 Jul 1;18(4):558-576. doi: 10.1093/bib/bbw060. Brief Bioinform. 2017. PMID: 27345524 Free PMC article. Review.
Cited by
-
Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine Learning.Front Genet. 2020 Jul 21;11:655. doi: 10.3389/fgene.2020.00655. eCollection 2020. Front Genet. 2020. PMID: 32849764 Free PMC article.
-
Tissue Expression Difference between mRNAs and lncRNAs.Int J Mol Sci. 2018 Oct 31;19(11):3416. doi: 10.3390/ijms19113416. Int J Mol Sci. 2018. PMID: 30384456 Free PMC article.
-
CircRNA identification and feature interpretability analysis.BMC Biol. 2024 Feb 27;22(1):44. doi: 10.1186/s12915-023-01804-x. BMC Biol. 2024. PMID: 38408987 Free PMC article.
-
A systematic review of the application of machine learning in the detection and classification of transposable elements.PeerJ. 2019 Dec 18;7:e8311. doi: 10.7717/peerj.8311. eCollection 2019. PeerJ. 2019. PMID: 31976169 Free PMC article.
-
RNMFMDA: A Microbe-Disease Association Identification Method Based on Reliable Negative Sample Selection and Logistic Matrix Factorization With Neighborhood Regularization.Front Microbiol. 2020 Oct 27;11:592430. doi: 10.3389/fmicb.2020.592430. eCollection 2020. Front Microbiol. 2020. PMID: 33193260 Free PMC article.
References
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials