Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec 7:11:585398.
doi: 10.3389/fmicb.2020.585398. eCollection 2020.

Discovery of Novel Biosynthetic Gene Cluster Diversity From a Soil Metagenomic Library

Affiliations

Discovery of Novel Biosynthetic Gene Cluster Diversity From a Soil Metagenomic Library

Alinne L R Santana-Pereira et al. Front Microbiol. .

Abstract

Soil microorganisms historically have been a rich resource for natural product discovery, yet the majority of these microbes remain uncultivated and their biosynthetic capacity is left underexplored. To identify the biosynthetic potential of soil microorganisms using a culture-independent approach, we constructed a large-insert metagenomic library in Escherichia coli from a topsoil sampled from the Cullars Rotation (Auburn, AL, United States), a long-term crop rotation experiment. Library clones were screened for biosynthetic gene clusters (BGCs) using either PCR or a NGS (next generation sequencing) multiplexed pooling strategy, coupled with bioinformatic analysis to identify contigs associated with each metagenomic clone. A total of 1,015 BGCs were detected from 19,200 clones, identifying 223 clones (1.2%) that carry a polyketide synthase (PKS) and/or a non-ribosomal peptide synthetase (NRPS) cluster, a dramatically improved hit rate compared to PCR screening that targeted type I polyketide ketosynthase (KS) domains. The NRPS and PKS clusters identified by NGS were distinct from known BGCs in the MIBiG database or those PKS clusters identified by PCR. Likewise, 16S rRNA gene sequences obtained by NGS of the library included many representatives that were not recovered by PCR, in concordance with the same bias observed in KS amplicon screening. This study provides novel resources for natural product discovery and circumvents amplification bias to allow annotation of a soil metagenomic library for a more complete picture of its functional and phylogenetic diversity.

Keywords: biases; biosynthetic ability; metagenome; next-generating sequencing; soil.

PubMed Disclaimer

Conflict of interest statement

ML and DM are the cofounders of the Varigen Biosciences Corporation. A licensing agreement between Auburn University and the Varigen Biosciences Corporation has been established for commercial development of the Cullars soil metagenomic library described in this manuscript. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Library construction and 3D pooling strategies. (A) Library construction strategy. Superscript letters correspond to citations: a: (Liles et al., 2008); b: (Kakirde et al., 2011); c: (Nasrin et al., 2018). (B) 3D Library Pooling strategy: Plate, Column, and Row are barcoded before sequencing. (C) Contig to clone deconvolution and in silico mining strategy. Reads from each pool are assembled separately into pool contigs. Highly homologous contigs from each dimension (Plate, Column, and Row) form triplets. Since their pool of origin is known, they form a coordinate system that points to their clone of origin. Contigs were analyzed bioinformatically for rational identification of candidates for functional screening.
FIGURE 2
FIGURE 2
Maximum likelihood tree of KS domains recovered from the soil metagenomic library. Clades in magenta are formed uniquely by domains recovered by NGS. (A) KS domains recovered either by PCR or by NGS. (B) KS domains recovered from the soil metagenomic library and from the MIBiG database showing branch lengths. (C) KS domains recovered from the soil metagenomic library and from the MIBiG database showing topology only to facilitate visualization. Circles in the tree represent branch bootstrap support >70%.
FIGURE 3
FIGURE 3
Maximum likelihood tree of A-domains recovered from NRPS clusters identified from the soil metagenomic library and the MIBiG database. (A) ML Tree showing branch lengths. (B) ML Tree ignoring branch lengths to facilitate topography visualization. Circles in the tree represent branch bootstrap support >70%.
FIGURE 4
FIGURE 4
Clustering of PKS (A) and NRPS (B) pathways against the MIBiG database using BiG-SCAPE. Library pathways are shown in orange and database pathways in blue. Circle size is proportional to the number of connections, and strike width is proportional to the similarity between pathways.
FIGURE 5
FIGURE 5
Analysis of the clones containing viable PKS and NRPS pathways reconstructed and validated from the Cullars metagenomic library. (A) Viable PKS and/or NRPS clusters recovered from the library compared by size (number of domains) and divergence from known BGCs (mean branch length and domain mean divergence). Each plot point represents a clone containing a viable BGC. Mean branch length for each clone is calculated as the average branch length of all the clones’ KS and/or A domains, as extracted from the ML trees against the MIBiG database. Mean domain divergence for each clone is calculated as the average domain divergence of all clones’ KS and/or A domains. Domain divergence is the complementary percentage to the% identity of the domain to the NCBI nr database (if a domain has% identity of 60, then the domain divergence is 40%). (B) Density distribution of branch length, domain divergence and number of domains across the viable PKS and NRPS BGCs. (C) Annotation of three interesting representative BGCs, representing the longest BGC from each type (PKS, NRPS, PKS-NRPS) to be recovered successfully in silico from the metagenomic library NGS contigs and validated upon clone resequencing. Annotation from the library contig was compared to the annotation from their corresponding validated insert sequence. ORFs are depicted by arrows and the PKS and/or NRPS domains within each ORF are represented by colored stripes.
FIGURE 6
FIGURE 6
Taxonomic diversity of the Cullars metagenomic library and of the original soil sample used for library construction. Soil PCR: 16S rRNA gene PCR amplification of DNA isolated from the original Cullars soil sample used to construct the library; Library PCR: 16S rRNA gene PCR amplification of the DNA template from all metagenomic library clones pooled together; Library NGS: In sillico mining for 16S rRNA genes from the metagenomic library NGS contigs. The phyla contained in the section “Other phyla” are: RCP2-54, SAR324, Spirochaetes, Sumerlaeota, WPS-2; all with relative frequencies below 0.05%.

Similar articles

Cited by

References

    1. Aakvik T., Degnes K. F., Dahlsrud R., Schmidt F., Dam R., Yu L. H., et al. (2009). A plasmid RK2-based broad-host-range cloning vector useful for transfer of metagenomic libraries to a variety of bacterial species. FEMS Microbiol. Lett. 296 149–158. 10.1111/j.1574-6968.2009.01639.x - DOI - PubMed
    1. Anderson H. J., Coleman J. E., Andersen R. J., Roberge M. (1997). Cytotoxic peptides hemiasterlin, hemiasterlin A and hemiasterlin B induce mitotic arrest and abnormal spindle formation. Cancer Chemother. Pharm. 39 223–226. 10.1007/s002800050564 - DOI - PubMed
    1. Bitok J. K., Lemetre C., Ternei M. A., Brady S. F. (2017). Identification of biosynthetic gene clusters from metagenomic libraries using PPTase complementation in a Streptomyces host. FEMS Microbiol. Lett. 364: fnx155. - PMC - PubMed
    1. Blin K., Shaw S., Steinke K., Villebro R., Ziemert N., Lee S. Y., et al. (2019). antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline. Nucl. Acids Res. 47 W81–W87. - PMC - PubMed
    1. Blin K., Wolf T., Chevrette M. G., Lu X., Schwalen C. J., Kautsar S. A., et al. (2017). antiSMASH 4.0-improvements in chemistry prediction and gene cluster boundary identification. Nucl. Acids Res. 45 W36–W41. - PMC - PubMed