LncRNA-ID: Long non-coding RNA IDentification using balanced random forests
- PMID: 26315901
- DOI: 10.1093/bioinformatics/btv480
LncRNA-ID: Long non-coding RNA IDentification using balanced random forests
Abstract
Motivation: Long non-coding RNAs (lncRNAs), which are non-coding RNAs of length above 200 nucleotides, play important biological functions such as gene expression regulation. To fully reveal the functions of lncRNAs, a fundamental step is to annotate them in various species. However, as lncRNAs tend to encode one or multiple open reading frames, it is not trivial to distinguish these long non-coding transcripts from protein-coding genes in transcriptomic data.
Results: In this work, we design a new tool that calculates the coding potential of a transcript using a machine learning model (random forest) based on multiple features including sequence characteristics of putative open reading frames, translation scores based on ribosomal coverage, and conservation against characterized protein families. The experimental results show that our tool competes favorably with existing coding potential computation tools in lncRNA identification.
Availability and implementation: The scripts and data can be downloaded at https://github.com/zhangy72/LncRNA-ID.
© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Similar articles
-
Prediction of plant lncRNA by ensemble machine learning classifiers.BMC Genomics. 2018 May 2;19(1):316. doi: 10.1186/s12864-018-4665-2. BMC Genomics. 2018. PMID: 29720103 Free PMC article.
-
Machine Learning-Based Annotation of Long Noncoding RNAs Using PLncPRO.Methods Mol Biol. 2020;2107:253-260. doi: 10.1007/978-1-0716-0235-5_12. Methods Mol Biol. 2020. PMID: 31893451
-
A Support Vector Machine based method to distinguish long non-coding RNAs from protein coding transcripts.BMC Genomics. 2017 Oct 18;18(1):804. doi: 10.1186/s12864-017-4178-4. BMC Genomics. 2017. PMID: 29047334 Free PMC article.
-
The small peptide world in long noncoding RNAs.Brief Bioinform. 2019 Sep 27;20(5):1853-1864. doi: 10.1093/bib/bby055. Brief Bioinform. 2019. PMID: 30010717 Free PMC article. Review.
-
Micropeptides translated from putative long non-coding RNAs.Acta Biochim Biophys Sin (Shanghai). 2022 Mar 25;54(3):292-300. doi: 10.3724/abbs.2022010. Acta Biochim Biophys Sin (Shanghai). 2022. PMID: 35538037 Free PMC article. Review.
Cited by
-
lncScore: alignment-free identification of long noncoding RNA from assembled novel transcripts.Sci Rep. 2016 Oct 6;6:34838. doi: 10.1038/srep34838. Sci Rep. 2016. PMID: 27708423 Free PMC article.
-
Small Open Reading Frame-Encoded Micro-Peptides: An Emerging Protein World.Int J Mol Sci. 2023 Jun 23;24(13):10562. doi: 10.3390/ijms241310562. Int J Mol Sci. 2023. PMID: 37445739 Free PMC article. Review.
-
Predicting functional long non-coding RNAs validated by low throughput experiments.RNA Biol. 2019 Nov;16(11):1555-1564. doi: 10.1080/15476286.2019.1644590. Epub 2019 Jul 26. RNA Biol. 2019. PMID: 31345106 Free PMC article.
-
Long Noncoding RNA Identification: Comparing Machine Learning Based Tools for Long Noncoding Transcripts Discrimination.Biomed Res Int. 2016;2016:8496165. doi: 10.1155/2016/8496165. Epub 2016 Nov 29. Biomed Res Int. 2016. PMID: 28042575 Free PMC article. Review.
-
LncMachine: a machine learning algorithm for long noncoding RNA annotation in plants.Funct Integr Genomics. 2021 Mar;21(2):195-204. doi: 10.1007/s10142-021-00769-w. Epub 2021 Feb 26. Funct Integr Genomics. 2021. PMID: 33635499
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources