Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 24;16(1):2865.
doi: 10.1038/s41467-025-57957-6.

Benchmarking metagenomic binning tools on real datasets across sequencing platforms and binning modes

Affiliations

Benchmarking metagenomic binning tools on real datasets across sequencing platforms and binning modes

Haitao Han et al. Nat Commun. .

Abstract

Metagenomic binning is a culture-free approach that facilitates the recovery of metagenome-assembled genomes by grouping genomic fragments. However, there remains a lack of a comprehensive benchmark to evaluate the performance of metagenomic binning tools across various combinations of data types and binning modes. In this study, we benchmark 13 metagenomic binning tools using short-read, long-read, and hybrid data under co-assembly, single-sample, and multi-sample binning, respectively. The benchmark results demonstrate that multi-sample binning exhibits optimal performance across short-read, long-read, and hybrid data. Moreover, multi-sample binning outperforms other binning modes in identifying potential antibiotic resistance gene hosts and near-complete strains containing potential biosynthetic gene clusters across diverse data types. This study also recommends three efficient binners across all data-binning combinations, as well as high-performance binners for each combination.

PubMed Disclaimer

Conflict of interest statement

Competing interests: All authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Number of NC and HQ MAGs recovered from five real datasets.
a, c, e, g, i The number of NC MAGs recovered from marine, cheese, human gut I, human gut II and activated sludge datasets respectively. b, d, f, h, j The number of HQ MAGs recovered from marine, cheese, human gut I, human gut II and activated sludge datasets respectively. The description of seven data-binning combinations can be seen in Table 2. “nan” denotes that the corresponding binner failed to complete execution within two weeks based on the computational resources we used.
Fig. 2
Fig. 2. Overall ranking scores (see Section Evaluation metrics and ranking score), runtime, and memory usage for each binner across seven data-binning combinations.
a Overall ranking scores of each binner across the seven data-binning combinations. The rankings of the binners, based on their overall ranking scores, were annotated. “nan” denotes that CONCOCT, MaxBin 2, MetaBinner, and COMEBin (GPU) cannot complete execution within two weeks on the activated sludge short-read co-assembly data (see Section Computational resources for computational resources). Consequently, these binners are ranked last on the activated sludge short-read co-assembly data (see Supplementary Fig. 3e). b, c Runtime and memory usage for each binner in the activated sludge dataset. According to the recommendations in the referenced study, COMEBin was used in GPU environments. In the short_co data-binning combination, there is a single assembly, with each bar representing the runtime or memory usage for each binner. In the remaining six data-binning combinations, there are 23 assemblies (N = 23) per combination, with each bar showing the average runtime or memory usage for each binner. The error bars indicate the standard deviation.
Fig. 3
Fig. 3. Comparison of bin-refinement tool performance across seven data-binning combinations.
a, d, g, j, m The number of MQ MAGs recovered from marine, cheese, human gut I, human gut II and activated sludge datasets, respectively. b, e, h, k, n The number of NC MAGs recovered from marine, cheese, human gut I, human gut II and activated sludge datasets, respectively. c, f, i, l, o The number of HQ MAGs recovered from marine, cheese, human gut I, human gut II and activated sludge datasets, respectively.
Fig. 4
Fig. 4. Number of species and strains recovered from the marine dataset.
a, b Number of species or strains after dereplication of refined NC MAGs across seven data-binning combinations. c, d Number of species or strains after dereplication of refined HQ MAGs across seven data-binning combinations.

Similar articles

References

    1. Jansson, J. K. Microorganisms, climate change, and the sustainable development goals: progress and challenges. Nat. Rev. Microbiol.21, 622–623 (2023). - PubMed
    1. Prosser, J. I. et al. The role of ecological theory in microbial ecology. Nat. Rev. Microbiol.5, 384–392 (2007). - PubMed
    1. Tringe, S. G. & Rubin, E. M. Metagenomics: DNA sequencing of environmental samples. Nat. Rev. Genet.6, 805–814 (2005). - PubMed
    1. Zeng, S. et al. A compendium of 32,277 metagenome-assembled genomes and over 80 million genes from the early-life human gut microbiome. Nat. Commun.13, 5139 (2022). - PMC - PubMed
    1. Albertsen, M. Long-read metagenomics paves the way toward a complete microbial tree of life. Nat. Methods20, 30–31 (2023). - PubMed

MeSH terms

LinkOut - more resources