Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 27;38(10):4647-4654.
doi: 10.1093/molbev/msab199.

BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes

Affiliations

BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes

Mosè Manni et al. Mol Biol Evol. .

Abstract

Methods for evaluating the quality of genomic and metagenomic data are essential to aid genome assembly procedures and to correctly interpret the results of subsequent analyses. BUSCO estimates the completeness and redundancy of processed genomic data based on universal single-copy orthologs. Here, we present new functionalities and major improvements of the BUSCO software, as well as the renewal and expansion of the underlying data sets in sync with the OrthoDB v10 release. Among the major novelties, BUSCO now enables phylogenetic placement of the input sequence to automatically select the most appropriate BUSCO data set for the assessment, allowing the analysis of metagenome-assembled genomes of unknown origin. A newly introduced genome workflow increases the efficiency and runtimes especially on large eukaryotic genomes. BUSCO is the only tool capable of assessing both eukaryotic and prokaryotic species, and can be applied to various data types, from genome assemblies and metagenomic bins, to transcriptomes and gene sets.

Keywords: completeness; eukaryotes; genome; metagenomes; microbes; prokaryotes; quality assessment; transcriptome; viruses.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Comparison of the number of complete BUSCOs obtained by running BUSCO v5 and v3 with BUSCO odb_10 and odb_9 data sets on (a) bacterial, (b) fungal, and (c) metazoan gene sets.
Fig. 2
Fig. 2
(a) Comparisons of BUSCO scores obtained on a set of fungal genomes using the two available workflows for eukaryotic species. The percentage on the y axis corresponds to the complete BUSCOs for the BUSCO_MetaEuk (orange) and BUSCO_Augustus (white) workflows. Assessments on gene sets are also displayed for comparison (green). Genomes were assessed using the most specific available data sets, which are displayed at the top of each subpanel. The newly introduced BUSCO_MetaEuk workflow allows faster assessments, see supplementary figure 3a, Supplementary Material online, for the differences in runtimes. (b and c) Effect of using different MetaEuk sensitivity values on BUSCO_Metaeuk runtimes and completeness estimation for 112 arthropod genomes evaluated with their most specific BUSCO data set. The default values are set at s = 4.5 and s = 6 for the first and the second MetaEuk runs, respectively. For the analyses, the same sensitivity value displayed on the y axis was used for both MetaEuk runs. The axis corresponding to runtimes (in seconds) is log-transformed.
Fig. 3
Fig. 3
BUSCO assessment on microbial data and comparison with CheckM. (a) Accuracy in the choice of data set produced by the auto-lineage mode when analyzing bacterial and archaeal assemblies (n = 436). For a given assembly, there can be between one and four suitable data sets (from the more general, root data set, down to the more specific one) to choose from (x axis). The selected data set is considered as “correct” when it is the most lineage-specific available for the genome; “suboptimal” when a parent lineage is selected; and “in disagreement with the NCBI” when the selected lineage is not part of the NCBI taxonomic annotation of that genome. This might indicate an error; however, 12 out of 19 genomes in this category are annotated by NCBI as “unclassified,” while sharing a parent lineage with the BUSCO selected data set; e.g. assembly GCF_000153385.1 is an unclassified Flavobacteria and was assigned to flavobacteriales_odb10 data set (also see supplementary table 7, Supplementary Material online). When supported by a high BUSCO score, this suggests that the data set selected by BUSCO was appropriate. (b and c) Comparison of BUSCO and CheckM completeness (blue) and redundancy (red) scores on a set of 436 genomes. For clarity, the two scatterplots are zoomed in on the areas of highest densities. n represents the number of data points displayed in the zoomed area. (d) Memory requirements for running BUSCO with the auto-lineage workflow on a set of bacterial and fungal genomes.
Fig. 4
Fig. 4
Benchmarking BUSCO estimates on artificially depleted genomes and gene sets of Drosophila melanogaster assessed with the diptera_odb10 data set. (a) Artificial depletion was made on the full gene set. (b) Artificial depletion exclusively made on genes matching BUSCO markers. For both panels, solid red lines indicate the expected missing values. Five randomly depleted versions were used for each level of depletion. (c) Precisions of the predictions for the analyses of panel (b).

References

    1. Brister JR, Ako-adjei D, Bao Y, Blinkova O.. 2015. NCBI viral genomes resource. Nucleic Acids Res. 43(Database issue):D571–D577. - PMC - PubMed
    1. Eren AM, Esen ÖC, Quince C, Vineis JH, Morrison HG, Sogin ML, Delmont TO.. 2015. Anvi’o: an advanced analysis and visualization platform for ‘omics data. PeerJ 3:e1319. - PMC - PubMed
    1. Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J, Bioconda Team. 2018. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 15(7):475–476. - PMC - PubMed
    1. Hyatt D, Chen G-L, LoCascio PF, Land ML, Larimer FW, Hauser LJ.. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119. - PMC - PubMed
    1. Kriventseva EV, Kuznetsov D, Tegenfeldt F, Manni M, Dias R, Simão FA, Zdobnov EM.. 2019. OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res. 47(D1):D807–D811. - PMC - PubMed

Publication types