This is a preprint.
GALBA: Genome Annotation with Miniprot and AUGUSTUS
- PMID: 37090650
- PMCID: PMC10120627
- DOI: 10.1101/2023.04.10.536199
GALBA: Genome Annotation with Miniprot and AUGUSTUS
Update in
-
Galba: genome annotation with miniprot and AUGUSTUS.BMC Bioinformatics. 2023 Aug 31;24(1):327. doi: 10.1186/s12859-023-05449-z. BMC Bioinformatics. 2023. PMID: 37653395 Free PMC article.
Abstract
The Earth Biogenome Project has rapidly increased the number of available eukaryotic genomes, but most released genomes continue to lack annotation of protein-coding genes. In addition, no transcriptome data is available for some genomes. Various gene annotation tools have been developed but each has its limitations. Here, we introduce GALBA, a fully automated pipeline that utilizes miniprot, a rapid protein- to-genome aligner, in combination with AUGUSTUS to predict genes with high accuracy. Accuracy results indicate that GALBA is particularly strong in the annotation of large vertebrate genomes. We also present use cases in insects, vertebrates, and a previously unannotated land plant. GALBA is fully open source and available as a docker image for easy execution with Singularity in high-performance computing environments. Our pipeline addresses the critical need for accurate gene annotation in newly sequenced genomes, and we believe that GALBA will greatly facilitate genome annotation for diverse organisms.
Figures






Similar articles
-
Galba: genome annotation with miniprot and AUGUSTUS.BMC Bioinformatics. 2023 Aug 31;24(1):327. doi: 10.1186/s12859-023-05449-z. BMC Bioinformatics. 2023. PMID: 37653395 Free PMC article.
-
BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS, and TSEBRA.Genome Res. 2024 Jun 25;34(5):769-777. doi: 10.1101/gr.278090.123. Genome Res. 2024. PMID: 38866550 Free PMC article.
-
BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS and TSEBRA.bioRxiv [Preprint]. 2024 Feb 29:2023.06.10.544449. doi: 10.1101/2023.06.10.544449. bioRxiv. 2024. Update in: Genome Res. 2024 Jun 25;34(5):769-777. doi: 10.1101/gr.278090.123. PMID: 37398387 Free PMC article. Updated. Preprint.
-
BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database.NAR Genom Bioinform. 2021 Jan 6;3(1):lqaa108. doi: 10.1093/nargab/lqaa108. eCollection 2021 Mar. NAR Genom Bioinform. 2021. PMID: 33575650 Free PMC article.
-
Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data.BMC Bioinformatics. 2017 Jan 27;18(Suppl 1):1426. doi: 10.1186/s12859-016-1426-6. BMC Bioinformatics. 2017. PMID: 28466793 Free PMC article.
Cited by
-
The nuclear and mitochondrial genome assemblies of Tetragonisca angustula (Apidae: Meliponini), a tiny yet remarkable pollinator in the Neotropics.BMC Genomics. 2024 Jun 11;25(1):587. doi: 10.1186/s12864-024-10502-z. BMC Genomics. 2024. PMID: 38862915 Free PMC article.
References
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources