Genome-wide discovery of pre-miRNAs: comparison of recent approaches based on machine learning
- PMID: 34020552
- DOI: 10.1093/bib/bbaa184
Genome-wide discovery of pre-miRNAs: comparison of recent approaches based on machine learning
Abstract
Motivation: The genome-wide discovery of microRNAs (miRNAs) involves identifying sequences having the highest chance of being a novel miRNA precursor (pre-miRNA), within all the possible sequences in a complete genome. The known pre-miRNAs are usually just a few in comparison to the millions of candidates that have to be analyzed. This is of particular interest in non-model species and recently sequenced genomes, where the challenge is to find potential pre-miRNAs only from the sequenced genome. The task is unfeasible without the help of computational methods, such as deep learning. However, it is still very difficult to find an accurate predictor, with a low false positive rate in this genome-wide context. Although there are many available tools, these have not been tested in realistic conditions, with sequences from whole genomes and the high class imbalance inherent to such data.
Results: In this work, we review six recent methods for tackling this problem with machine learning. We compare the models in five genome-wide datasets: Arabidopsis thaliana, Caenorhabditis elegans, Anopheles gambiae, Drosophila melanogaster, Homo sapiens. The models have been designed for the pre-miRNAs prediction task, where there is a class of interest that is significantly underrepresented (the known pre-miRNAs) with respect to a very large number of unlabeled samples. It was found that for the smaller genomes and smaller imbalances, all methods perform in a similar way. However, for larger datasets such as the H. sapiens genome, it was found that deep learning approaches using raw information from the sequences reached the best scores, achieving low numbers of false positives.
Availability: The source code to reproduce these results is in: http://sourceforge.net/projects/sourcesinc/files/gwmirna Additionally, the datasets are freely available in: https://sourceforge.net/projects/sourcesinc/files/mirdata.
Keywords: deep-learning; genome-wide; pre-miRNA prediction.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Similar articles
-
Genome-wide pre-miRNA discovery from few labeled examples.Bioinformatics. 2018 Feb 15;34(4):541-549. doi: 10.1093/bioinformatics/btx612. Bioinformatics. 2018. PMID: 29028911
-
Genome-wide hairpins datasets of animals and plants for novel miRNA prediction.Data Brief. 2019 Jul 3;25:104209. doi: 10.1016/j.dib.2019.104209. eCollection 2019 Aug. Data Brief. 2019. PMID: 31453279 Free PMC article.
-
High precision in microRNA prediction: A novel genome-wide approach with convolutional deep residual networks.Comput Biol Med. 2021 Jul;134:104448. doi: 10.1016/j.compbiomed.2021.104448. Epub 2021 May 5. Comput Biol Med. 2021. PMID: 33979731
-
Predicting novel microRNA: a comprehensive comparison of machine learning approaches.Brief Bioinform. 2019 Sep 27;20(5):1607-1620. doi: 10.1093/bib/bby037. Brief Bioinform. 2019. PMID: 29800232 Review.
-
Computational Detection of Pre-microRNAs.Methods Mol Biol. 2022;2257:167-174. doi: 10.1007/978-1-0716-1170-8_8. Methods Mol Biol. 2022. PMID: 34432278 Review.
Cited by
-
DeepCNV: a deep learning approach for authenticating copy number variations.Brief Bioinform. 2021 Sep 2;22(5):bbaa381. doi: 10.1093/bib/bbaa381. Brief Bioinform. 2021. PMID: 33429424 Free PMC article.
-
Hybrid Deep Neural Network for Handling Data Imbalance in Precursor MicroRNA.Front Public Health. 2021 Dec 23;9:821410. doi: 10.3389/fpubh.2021.821410. eCollection 2021. Front Public Health. 2021. PMID: 35004605 Free PMC article.
-
MicroRNA-mediated bioengineering for climate-resilience in crops.Bioengineered. 2021 Dec;12(2):10430-10456. doi: 10.1080/21655979.2021.1997244. Bioengineered. 2021. PMID: 34747296 Free PMC article. Review.
-
Deep Learning for the discovery of new pre-miRNAs: Helping the fight against COVID-19.Mach Learn Appl. 2021 Dec 15;6:100150. doi: 10.1016/j.mlwa.2021.100150. Epub 2021 Sep 9. Mach Learn Appl. 2021. PMID: 34939043 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Molecular Biology Databases