Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 6;3(1):lqaa108.
doi: 10.1093/nargab/lqaa108. eCollection 2021 Mar.

BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database

Affiliations

BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database

Tomáš Brůna et al. NAR Genom Bioinform. .

Abstract

The task of eukaryotic genome annotation remains challenging. Only a few genomes could serve as standards of annotation achieved through a tremendous investment of human curation efforts. Still, the correctness of all alternative isoforms, even in the best-annotated genomes, could be a good subject for further investigation. The new BRAKER2 pipeline generates and integrates external protein support into the iterative process of training and gene prediction by GeneMark-EP+ and AUGUSTUS. BRAKER2 continues the line started by BRAKER1 where self-training GeneMark-ET and AUGUSTUS made gene predictions supported by transcriptomic data. Among the challenges addressed by the new pipeline was a generation of reliable hints to protein-coding exon boundaries from likely homologous but evolutionarily distant proteins. In comparison with other pipelines for eukaryotic genome annotation, BRAKER2 is fully automatic. It is favorably compared under equal conditions with other pipelines, e.g. MAKER2, in terms of accuracy and performance. Development of BRAKER2 should facilitate solving the task of harmonization of annotation of protein-coding genes in genomes of different eukaryotic species. However, we fully understand that several more innovations are needed in transcriptomic and proteomic technologies as well as in algorithmic development to reach the goal of highly accurate annotation of eukaryotic genomes.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Flowchart of the BRAKER2 pipeline. Input, intermediate and output data are shown by ovals. The tools and processes of the ProtHint pipeline are shown in orange; other components of BRAKER2 are shown in blue.
Figure 2.
Figure 2.
Evidence integration in BRAKER2. (A) Target proteins; (B) Introns, gene start and stop sites defined by spliced alignments of target proteins to genome; (C) CDSpart chains; (D) Genome sequence; (E) Genes predicted by GeneMark-EP+ at a given iteration. The high confidence hints are enforced (red arrows); (F) Anchored sites, the splice sites and gene ends predicted ab initio and corroborated by protein hints; (G) Anchored introns and intergenic sequences bounded by anchored gene ends are selected into training of non-coding sequence model for GeneMark-EP+; (H) Anchored multi-exon and single exon genes predicted by GeneMark-EP+ and selected for training AUGUSTUS; (I) Transcripts predicted by AUGUSTUS with support of an external evidence.
Figure 3.
Figure 3.
Exon level Sn and Sp determined for each genome in the three runs of BRAKER2 with protein support, the run of BRAKER1 with RNA-seq support and the run of GeneMark-ES. BRAKER2 was run with support of proteins from OrthoDB excluding proteins (i) of the same species, (ii) of all species of the same taxonomic family, (iii) of all species of the same taxonomic order.
Figure 4.
Figure 4.
Gene level Sn and Sp determined in the tests described in the legend for Figure 3.

Similar articles

Cited by

  • Chromosome-level genome assembly of Megachile lagopoda (Linnaeus, 1761) (Hymenoptera: Megachilidae).
    Zhang D, Jin J, Niu Z, Orr MC, Zhang F, Ferrari RR, Wu Q, Zhou Q, Da W, Luo A, Zhu C. Zhang D, et al. Sci Data. 2024 Oct 29;11(1):1171. doi: 10.1038/s41597-024-04028-y. Sci Data. 2024. PMID: 39472626 Free PMC article.
  • A haplotype resolved chromosomal level avocado genome allows analysis of novel avocado genes.
    Nath O, Fletcher SJ, Hayward A, Shaw LM, Masouleh AK, Furtado A, Henry RJ, Mitter N. Nath O, et al. Hortic Res. 2022 Mar 30;9:uhac157. doi: 10.1093/hr/uhac157. eCollection 2022. Hortic Res. 2022. PMID: 36204209 Free PMC article.
  • Ensembl 2023.
    Martin FJ, Amode MR, Aneja A, Austine-Orimoloye O, Azov AG, Barnes I, Becker A, Bennett R, Berry A, Bhai J, Bhurji SK, Bignell A, Boddu S, Branco Lins PR, Brooks L, Ramaraju SB, Charkhchi M, Cockburn A, Da Rin Fiorretto L, Davidson C, Dodiya K, Donaldson S, El Houdaigui B, El Naboulsi T, Fatima R, Giron CG, Genez T, Ghattaoraya GS, Martinez JG, Guijarro C, Hardy M, Hollis Z, Hourlier T, Hunt T, Kay M, Kaykala V, Le T, Lemos D, Marques-Coelho D, Marugán JC, Merino GA, Mirabueno LP, Mushtaq A, Hossain SN, Ogeh DN, Sakthivel MP, Parker A, Perry M, Piližota I, Prosovetskaia I, Pérez-Silva JG, Salam AIA, Saraiva-Agostinho N, Schuilenburg H, Sheppard D, Sinha S, Sipos B, Stark W, Steed E, Sukumaran R, Sumathipala D, Suner MM, Surapaneni L, Sutinen K, Szpak M, Tricomi FF, Urbina-Gómez D, Veidenberg A, Walsh TA, Walts B, Wass E, Willhoft N, Allen J, Alvarez-Jarreta J, Chakiachvili M, Flint B, Giorgetti S, Haggerty L, Ilsley GR, Loveland JE, Moore B, Mudge JM, Tate J, Thybert D, Trevanion SJ, Winterbottom A, Frankish A, Hunt SE, Ruffier M, Cunningham F, Dyer S, Finn RD, Howe KL, Harrison PW, Yates AD, Flicek P. Martin FJ, et al. Nucleic Acids Res. 2023 Jan 6;51(D1):D933-D941. doi: 10.1093/nar/gkac958. Nucleic Acids Res. 2023. PMID: 36318249 Free PMC article.
  • Multi-omics analyses reveal MdMYB10 hypermethylation being responsible for a bud sport of apple fruit color.
    Liu Y, Gao XH, Tong L, Liu MZ, Zhou XK, Tahir MM, Xing LB, Ma JJ, An N, Zhao CP, Yao JL, Zhang D. Liu Y, et al. Hortic Res. 2022 Aug 29;9:uhac179. doi: 10.1093/hr/uhac179. eCollection 2022. Hortic Res. 2022. PMID: 36338840 Free PMC article.
  • The genomes of the aquarium sponges Tethya wilhelma and Tethya minuta (Porifera: Demospongiae).
    Wörheide G, Francis WR, Deister F, Krebs S, Erpenbeck D, Vargas S. Wörheide G, et al. F1000Res. 2024 Aug 1;13:679. doi: 10.12688/f1000research.150836.2. eCollection 2024. F1000Res. 2024. PMID: 39193510 Free PMC article.

References

    1. Lomsadze A., Ter-Hovhannisyan V., Chernoff Y.O., Borodovsky M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res. 2005; 33:6494–6506. - PMC - PubMed
    1. Shulaev V., Sargent D.J., Crowhurst R.N., Mockler T.C., Folkerts O., Delcher A.L., Jaiswal P., Mockaitis K., Liston A., Mane S.P. et al. . The genome of woodland strawberry (Fragaria vesca). Nat. Genet. 2011; 43:109–116. - PMC - PubMed
    1. Zhan S., Merlin C., Boore J.L., Reppert S.M. The monarch butterfly genome yields insights into long-distance migration. Cell. 2011; 147:1171–1185. - PMC - PubMed
    1. Zheng H., Zhang W., Zhang L., Zhang Z., Li J., Lu G., Zhu Y., Wang Y., Huang Y., Liu J. et al. . The genome of the hydatid tapeworm Echinococcus granulosus. Nat. Genet. 2013; 45:1168–1175. - PubMed
    1. Suga H., Chen Z., de Mendoza A., Sebe-Pedros A., Brown M.W., Kramer E., Carr M., Kerner P., Vervoort M., Sanchez-Pons N. et al. . The Capsaspora genome reveals a complex unicellular prehistory of animals. Nat. Commun. 2013; 4:2325. - PMC - PubMed