Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 11;14(1):7318.
doi: 10.1038/s41467-023-43000-z.

A genomic catalogue of soil microbiomes boosts mining of biodiversity and genetic resources

Affiliations

A genomic catalogue of soil microbiomes boosts mining of biodiversity and genetic resources

Bin Ma et al. Nat Commun. .

Erratum in

Abstract

Soil harbors a vast expanse of unidentified microbes, termed as microbial dark matter, presenting an untapped reservo)ir of microbial biodiversity and genetic resources, but has yet to be fully explored. In this study, we conduct a large-scale excavation of soil microbial dark matter by reconstructing 40,039 metagenome-assembled genome bins (the SMAG catalogue) from 3304 soil metagenomes. We identify 16,530 of 21,077 species-level genome bins (SGBs) as unknown SGBs (uSGBs), which expand archaeal and bacterial diversity across the tree of life. We also illustrate the pivotal role of uSGBs in augmenting soil microbiome's functional landscape and intra-species genome diversity, providing large proportions of the 43,169 biosynthetic gene clusters and 8545 CRISPR-Cas genes. Additionally, we determine that uSGBs contributed 84.6% of previously unexplored viral-host associations from the SMAG catalogue. The SMAG catalogue provides an useful genomic resource for further studies investigating soil microbial biodiversity and genetic resources.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Recovery of genomes from globally distributed soil metagenomes.
a A total of 40,039 MAGs were recovered from 3304 soil metagenomes. b Geographic distribution of metagenomes within each habitat. c Distribution of quality metrics across the MAGs. d Comparison of the current dataset with the published MAG catalogue across different environments; UHGG (Unified Human Gastrointestinal Genome).
Fig. 2
Fig. 2. The SMAG substantially expands the diversity of soil microbes.
a 16,530 genomes (41%) from SMAG (40,039 MAGs) were assigned to the uSGBs. b uSGBs improve mappability for soil metagenomes. c The rarefaction curve is obviously unsaturated at specie rank in the SMAG dataset. d A phylogenetic tree was built for 21,077 SGBs based on the concatenated 400 conserved universal PhyloPhlAn markers genes. e The comparison of the number of genomes across phyla between kSGBs and uSGBs. f The biosphere distribution of SGBs across metagenomic samples.
Fig. 3
Fig. 3. Functional landscape and intraspecies genomic variation analyses within the soil microbiome.
a Functional category enrichment differential distribution between the kSGBs and uSGBs of the 5184 high-quality MAGs. uSGBs substantially expanded functional landscape of most of phyla in SMAG catalogue. b The abundance of core genes for kSGBs and uSGBs across phyla. c Proportion of core and accessory genes (n = 2,200 species) classified with various annotation schemes. A two-tailed Wilcoxon rank-sum test was performed to compare the classification between the core and accessory genes (*P < 0.05), eggNOG (*P = 0.054), KEGG (***P = 0.0005), GO (*P = 0.041). d Comparison of the KEGG pathways between the core and accessory genes. e Comparison of the COG categories between the core and accessory genes. f Total number of SNVs detected as a function of the number of species, and uSGBs detected more SNVs than kSGBs. g The density of SNVs for kSGBs and uSGBs across dominant phyla (n = 2448 species). h The pN/pS ratios for kSGBs and uSGBs across dominant phyla (n = 2448 species). Data of (g) and (h)  are presented as mean values +/− Standard Deviation (SD).
Fig. 4
Fig. 4. Biosynthetic gene clusters recovered from the SMAG catalogue.
a BGCs of the SMAG between kSGBs and uSGBs. All the BGCs were separated into eight BiG-SCAPE classes. Non-ribosomal peptide synthetase (NRPS), Ribosomally synthesized and post-translationally modified peptide (RiPPs), polyketide synthase (PKS I), Terpene, PKS–NRPS hybrid, PKS other, Saccharides, Others. b The relative frequency of BGC types across dominant phyla BGC genes are predominantly identified in Proteobacteria, Actinobacteriota, Acidobacteriota and Bacteroidota. They are highly variable across phyla. c Number and BGC types identified from the SMAG. d Encoding the most remarkable number of BGC clusters including 111 NRPS or PKS modules and with clear colinear module chains. e The single largest BGC region found in a soil-derived bacterium from the Acidobacteria phylum and UBA5704 family. f Distribution and KO assignment of the two largest BGCs from SMAG and GEM.
Fig. 5
Fig. 5. The profile of spacers and Cas-associated proteins in the genomes of SMAG catalogue.
a Most MAGs possessed only one spacer sequence. b, e The number of spacer sequences and Cas-associated genes did not increase with genome sizes, either for kSGBs or uSGBs. c, f Spacer sequences (n = 662 MAGs) and Cas-associated gene (n = 563 MAGs) loads differed significantly across phyla, data of (c) and (f) are presented as mean values +/− SD. d The count of Cas-associated genes processed among MAGs. g The top 10 number of different Cas proteins processed from the SMAG catalogue. Most (6934) of predicted Cas genes were uncertain. h The profile of Cas-associated genes processed for kSGBs and uSGBs across phyla. uSGBs expanded the profiles of Cas-associated genes.
Fig. 6
Fig. 6. The SMAG resolves virus-host connectivity.
a The virus-host association counts across phyla. b The virus-host associations for kSGBs and uSGBs predicted by prophages. c The host phylogenetic ranges of viruses. GSV_66726, GSV_527, and GSV_39462 were the previously unidentified virus from in-house data Global Soil Virome (GSV).

References

    1. Banerjee, S. & van der Heijden, M. G. A. Soil microbiomes and one health. Nat. Rev. Microbiol.10.1038/s41579-022-00779-w (2022). - PubMed
    1. Shlaes, D. M. The Perfect Storm. in Antibiotics: The Perfect Storm (ed. Shlaes, D. M.) 1–7 (Springer Netherlands, 2010).
    1. New FN, Brito IL. What is metagenomics teaching us, and what is missed? Annu. Rev. Microbiol. 2020;74:117–135. doi: 10.1146/annurev-micro-012520-072314. - DOI - PubMed
    1. Fierer N. Embracing the unknown: disentangling the complexities of the soil microbiome. Nat. Rev. Microbiol. 2017;15:579–590. doi: 10.1038/nrmicro.2017.87. - DOI - PubMed
    1. Rinke C, et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature. 2013;499:431–437. doi: 10.1038/nature12352. - DOI - PubMed

Publication types