Balrog: A universal protein model for prokaryotic gene prediction
- PMID: 33635857
- PMCID: PMC7946324
- DOI: 10.1371/journal.pcbi.1008727
Balrog: A universal protein model for prokaryotic gene prediction
Abstract
Low-cost, high-throughput sequencing has led to an enormous increase in the number of sequenced microbial genomes, with well over 100,000 genomes in public archives today. Automatic genome annotation tools are integral to understanding these organisms, yet older gene finding methods must be retrained on each new genome. We have developed a universal model of prokaryotic genes by fitting a temporal convolutional network to amino-acid sequences from a large, diverse set of microbial genomes. We incorporated the new model into a gene finding system, Balrog (Bacterial Annotation by Learned Representation Of Genes), which does not require genome-specific training and which matches or outperforms other state-of-the-art gene finding tools. Balrog is freely available under the MIT license at https://github.com/salzberg-lab/Balrog.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures



Similar articles
-
MicrobeAnnotator: a user-friendly, comprehensive functional annotation pipeline for microbial genomes.BMC Bioinformatics. 2021 Jan 6;22(1):11. doi: 10.1186/s12859-020-03940-5. BMC Bioinformatics. 2021. PMID: 33407081 Free PMC article.
-
A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.BMC Bioinformatics. 2016 Jun 30;17(1):260. doi: 10.1186/s12859-016-1142-2. BMC Bioinformatics. 2016. PMID: 27363390 Free PMC article.
-
Prokaryotic Genome Annotation.Methods Mol Biol. 2022;2349:193-214. doi: 10.1007/978-1-0716-1585-0_10. Methods Mol Biol. 2022. PMID: 34718997
-
Comparative Genomics for Prokaryotes.Methods Mol Biol. 2018;1704:55-78. doi: 10.1007/978-1-4939-7463-4_3. Methods Mol Biol. 2018. PMID: 29277863 Review.
-
Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world.Nucleic Acids Res. 2008 Dec;36(21):6688-719. doi: 10.1093/nar/gkn668. Epub 2008 Oct 23. Nucleic Acids Res. 2008. PMID: 18948295 Free PMC article. Review.
Cited by
-
Unifying the known and unknown microbial coding sequence space.Elife. 2022 Mar 31;11:e67667. doi: 10.7554/eLife.67667. Elife. 2022. PMID: 35356891 Free PMC article.
-
A review of computational tools for generating metagenome-assembled genomes from metagenomic sequencing data.Comput Struct Biotechnol J. 2021 Nov 23;19:6301-6314. doi: 10.1016/j.csbj.2021.11.028. eCollection 2021. Comput Struct Biotechnol J. 2021. PMID: 34900140 Free PMC article. Review.
-
Adaptive lifestyle of bacteria determines phage-bacteria interaction.Front Microbiol. 2022 Dec 6;13:1056388. doi: 10.3389/fmicb.2022.1056388. eCollection 2022. Front Microbiol. 2022. PMID: 36560945 Free PMC article.
-
Accurate and fast graph-based pangenome annotation and clustering with ggCaller.Genome Res. 2023 Sep;33(9):1622-1637. doi: 10.1101/gr.277733.123. Epub 2023 Aug 24. Genome Res. 2023. PMID: 37620118 Free PMC article.
-
PSAURON: a tool for assessing protein annotation across a broad range of species.NAR Genom Bioinform. 2025 Jan 7;7(1):lqae189. doi: 10.1093/nargab/lqae189. eCollection 2025 Mar. NAR Genom Bioinform. 2025. PMID: 39781514 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous