Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Aug 26;10(1):3848.
doi: 10.1038/s41467-019-11658-z.

Uncovering the biosynthetic potential of rare metagenomic DNA using co-occurrence network analysis of targeted sequences

Affiliations

Uncovering the biosynthetic potential of rare metagenomic DNA using co-occurrence network analysis of targeted sequences

Vincent Libis et al. Nat Commun. .

Abstract

Sequencing of DNA extracted from environmental samples can provide key insights into the biosynthetic potential of uncultured bacteria. However, the high complexity of soil metagenomes, which can contain thousands of bacterial species per gram of soil, imposes significant challenges to explore secondary metabolites potentially produced by rare members of the soil microbiome. Here, we develop a targeted sequencing workflow termed CONKAT-seq (co-occurrence network analysis of targeted sequences) that detects physically clustered biosynthetic domains, a hallmark of bacterial secondary metabolism. Following targeted amplification of conserved biosynthetic domains in a highly partitioned metagenomic library, CONKAT-seq evaluates amplicon co-occurrence patterns across library subpools to identify chromosomally clustered domains. We show that a single soil sample can contain more than a thousand uncharacterized biosynthetic gene clusters, most of which originate from low frequency genomes which are practically inaccessible through untargeted sequencing. CONKAT-seq allows scalable exploration of largely untapped biosynthetic diversity across multiple soils, and can guide the discovery of novel secondary metabolites from rare members of the soil microbiome.

PubMed Disclaimer

Conflict of interest statement

S.F.B is the founder of LODO Therapeutics. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1
CONKAT-seq enables the exploration of rare biosynthetic gene clusters in complex metagenomes. a Untargeted methods to explore the biosynthetic potential of low frequency organisms in the soil metagenome are limited by the required high coverage depth, and computationally challenging de novo assembly process. PCR based methods are extremely sensitive, but do not capture the functional clustering of biosynthetic domain and therefore are information poor. b CONKAT-seq uses the highly partitioned structure of metagenomic cosmid libraries to reconstruct the chromosomal organization of biosynthetic domains based on PCR amplicon data
Fig. 2
Fig. 2
A single metagenome can contain hundreds of previously uncharacterized, low-frequency BGCs that are practically inaccessible through untargeted sequencing. a Visualization of clustered domain networks predicted by using CONKAT-seq on a soil metagenomic library. Hexagons represent networks with size proportional to the number of domains and colored according to their similarity score to known BGCs. Only networks with between 3 and 30 domains are presented. A subset of 60 networks (squares) is presented in detail where nodes correspond to AD and KS domain variants and edges link variants that are predicted to be physically co-clustered. Nodes are colored based on the similarity score of the network. b Domain network predictions were validated using long-read sequencing of library subpools or by the recovery and sequencing of library clones encoding BGCs. Each bar is associated with a specific domain network (Supplementary Data 1). Based on assembled contigs from sequenced subpools (left) or recovered clones (middle and right) the number of experimentally validated domain clustering predictions is presented (blue) in comparison to the total number of domains in the network (gray). Black bars represent the number of false clustering predictions (i.e., domains in the network that were not present in the metagenomic insert). In cases where the length of BGCs exceeds the size of the metagenomic insert, only a subset of the domains could be validated from a single cosmid (middle, gray vs. blue bars). We demonstrate that in such cases all of the domain network can be recovered from multiple overlapping clones (right). c Sample of 12 annotated BGCs that were recovered in based on CONKAT-seq predictions. For each BGC we indicate the “closest” BGC based on Bigscape analysis (Supplementary Table 3, Supplementary Fig. 4). Ticks represent actual domains positions and circle markers indicate the domains predicted by the network. Large BGCs that were recovered from multiple overlapping clones (VII and IX) are marked by underlines. d Sequencing output (in terabases) required to reach a depth of coverage of 20X of the recovered BGCs (black) and of ≈3000 representative metagenomic inserts found in two subpools of the library (gray)
Fig. 3
Fig. 3
Comparing domain networks from different samples enables discovery of common yet unknown families of natural products. a 3591 domain networks obtained from four soil samples were compared to each other and to sequences of BGCs found in databases. For each soil sample, the outer ring (red and blue) depicts the proportions of networks that display similarity to sequences of BGC found in databases. The fraction of networks found in each soil sharing relatives with a median identity higher than 90% in other soil samples is represented by the ratio of the width of the ribbons relative to the corresponding part of the inner ring (gray). Each ribbon represents the fraction of networks shared by combinations of 2, 3, or all 4 soils analyzed. Colored ribbons represent networks with close relatives in Oregon, Arizona and New Mexico (Magenta) or Hawaii, Arizona and Oregon (Cyan) out of which one representative family was physically recovered and depicted in b and c. b Detailed view of the pairwise comparison of biosynthetic domains belonging to related networks that form a family of common yet unknown BGCs. Recovery of metagenomic clones encoding this BGC family from geographically distant soils revealed novel and highly similar gene content and architecture. Colored shading covers the portion of metagenomic DNA showing high homology between related BGCs while gray highlight genes lacking significant homology that likely belong to independent chromosomal contexts outside of BGC boundaries. c CONKAT-seq networks and BGCs for members of a family of common yet unknown BGCs identified in Hawaii and Arizona soil samples. d The corresponding overlapping cosmids isolated from E. coli clones were assembled by transformation-associated recombination in S. cerevisiae into a bacterial artificial chromosome (BAC) which was later integrated in the chromosome of Streptomyces albus. e Heterologous expression of the two related metagenomic BGCs and analysis of the associated crude extracts by HPLC/MS led to the detection of new peaks normally absent in the control extract obtained from the host native background (S. albus). f Both BGCs led to the isolation of the same major product, omnipeptin, a novel 11 residue cyclic depsipeptide
Fig. 4
Fig. 4
CONKAT-seq identifies BGCs encoding for small molecules with desired chemical features in the soil metagenome. a Using a collection of two or more conserved genes targeted by degenerate primers CONKAT-seq reconstructs the chromosomal association of multiple biosynthetic domains. We designed three primer pairs and performed PCR amplification of conserved enzymes in the biosynthesis of molecular building blocks for secondary metabolites: MppR (enduracididine), VioD (capreomycidine), and RifK (3-amino-5-hydroxybenzoic acid, AHBA). By analyzing the co-occurrences frequencies of the resulting amplicons with amplicons of other biosynthetic domains (e.g., AD or KS domains), CONKAT-seq identifies BGCs that specifically incorporate these molecular building blocks in their molecular product. b Examples of three BGCs that were recovered from the soil metagenome based on CONKAT-seq predictions

References

    1. Jousset A, et al. Where less may be more: how the rare biosphere pulls ecosystems strings. ISME J. 2017;11:853–862. doi: 10.1038/ismej.2016.174. - DOI - PMC - PubMed
    1. Bent SJ, Forney LJ. The tragedy of the uncommon: understanding limitations in the analysis of microbial diversity. ISME J. 2008;2:689–695. doi: 10.1038/ismej.2008.44. - DOI - PubMed
    1. Delgado-Baquerizo M, et al. A global atlas of the dominant bacteria found in soil. Science. 2018;359:320–325. doi: 10.1126/science.aap9516. - DOI - PubMed
    1. Thompson LR, et al. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature. 2017;551:457–463. doi: 10.1038/nature24621. - DOI - PMC - PubMed
    1. Charlop-Powers Z, et al. Urban park soil microbiomes are a rich reservoir of natural product biosynthetic diversity. Proc. Natl Acad. Sci. USA. 2016;113:14811–14816. doi: 10.1073/pnas.1615581113. - DOI - PMC - PubMed

Publication types