LncRAnalyzer: a robust workflow for long non-coding RNA discovery using RNA-Seq
- PMID: 41103112
- DOI: 10.1111/tpj.70509
LncRAnalyzer: a robust workflow for long non-coding RNA discovery using RNA-Seq
Abstract
Long non-coding RNA (lncRNA) is a major transcript category that lacks protein-coding capabilities, with relatively low abundance and complex expression patterns. Distinguishing lncRNAs from protein-coding genes is a complex process involving multiple filtering steps. We developed an automated pipeline named LncRAnalyzer featuring retrained models for 60 species. This workflow aims to reduce the likelihood of obtaining protein-coding or partial protein-coding transcripts during lncRNA identification by utilizing eight distinct approaches. We conducted a 10-fold cross-validation of the sorghum models and training sets with their standard ones and other approaches using real-life RNA-Seq datasets and known lncRNA and CDS sequences of sorghum. The results showed that the sorghum models and training sets were outperformed. The pipeline output comprises upset plots illustrating the number of lncRNA/NPCTs identified by the approaches, commonly identified lncRNA and their classes, NPCTs, and expression count tables. A feature-level comparison and benchmarking analysis of LncRAnalyzer with four existing pipelines, namely, LncPipe, LncEvo, lncRNA-Annotation, and Plant-LncPipe, demonstrated that LncRAnalyzer is more comprehensive, easier to implement, and accurate in lncRNA predictions. This workflow also ascertains lncRNA origins from various Transposable Elements (TEs) in plants using TE annotations from APTEdb [http://apte.cp.utfpr.edu.br/]. LncRAnalyzer is publicly available on GitLab [https://gitlab.com/nikhilshinde0909/LncRAnalyzer.git] for academic users.
Keywords: LncRAnalyzer; RNA‐Seq; Sorghum bicolor; genomics; long non‐coding RNA.
© 2025 Society for Experimental Biology and John Wiley & Sons Ltd.
References
-
- Ammunét, T., Wang, N., Khan, S. & Elo, L.L. (2022) Deep learning tools are top performers in long non‐coding RNA prediction. Briefings in Functional Genomics, 21, 230–241.
-
- Ashiwal, P., Tripathi, P. & Miri, R. (2016) Web information retrieval using python and BeautifulSoup. International Journal for Research in Applied Science and Engineering Technology, 4.
-
- Boutet, E., Lieberherr, D., Tognolli, M., Schneider, M., Bansal, P., Bridge, A.J. et al. (2016) Uniprotkb/swiss‐prot, the manually annotated section of the uniprot knowledgebase: how to use the entry view. Methods in Molecular Biology, 1374, 23–54.
-
- Bryzghalov, O., Makałowska, I. & Szcześniak, M.W. (2021) lncEvo: automated identification and conservation study of long noncoding RNAs. BMC Bioinformatics, 22, 59.
-
- Buske, F.A., Bauer, D.C., Mattick, J.S. & Bailey, T.L. (2012) Triplexator: detecting nucleic acid triple helices in genomic and transcriptomic data. Genome Research, 22, 1372–1381.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Miscellaneous
