Tiberius: end-to-end deep learning with an HMM for gene prediction
- PMID: 39558581
- PMCID: PMC11645249
- DOI: 10.1093/bioinformatics/btae685
Tiberius: end-to-end deep learning with an HMM for gene prediction
Abstract
Motivation: For more than 25 years, learning-based eukaryotic gene predictors were driven by hidden Markov models (HMMs), which were directly inputted a DNA sequence. Recently, Holst et al. demonstrated with their program Helixer that the accuracy of ab initio eukaryotic gene prediction can be improved by combining deep learning layers with a separate HMM postprocessor.
Results: We present Tiberius, a novel deep learning-based ab initio gene predictor that end-to-end integrates convolutional and long short-term memory layers with a differentiable HMM layer. Tiberius uses a custom gene prediction loss and was trained for prediction in mammalian genomes and evaluated on human and two other genomes. It significantly outperforms existing ab initio methods, achieving F1 scores of 62% at gene level for the human genome, compared to 21% for the next best ab initio method. In de novo mode, Tiberius predicts the exon-intron structure of two out of three human genes without error. Remarkably, even Tiberius's ab initio accuracy matches that of BRAKER3, which uses RNA-seq data and a protein database. Tiberius's highly parallelized model is the fastest state-of-the-art gene prediction method, processing the human genome in under 2 hours.
Availability and implementation: https://github.com/Gaius-Augustus/Tiberius.
© The Author(s) 2024. Published by Oxford University Press.
Figures
References
MeSH terms
LinkOut - more resources
Full Text Sources
