Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Nov;10(11):963-8.
doi: 10.1038/nchembio.1659. Epub 2014 Sep 28.

A roadmap for natural product discovery based on large-scale genomics and metabolomics

Affiliations

A roadmap for natural product discovery based on large-scale genomics and metabolomics

James R Doroghazi et al. Nat Chem Biol. 2014 Nov.

Abstract

Actinobacteria encode a wealth of natural product biosynthetic gene clusters, whose systematic study is complicated by numerous repetitive motifs. By combining several metrics, we developed a method for the global classification of these gene clusters into families (GCFs) and analyzed the biosynthetic capacity of Actinobacteria in 830 genome sequences, including 344 obtained for this project. The GCF network, comprising 11,422 gene clusters grouped into 4,122 GCFs, was validated in hundreds of strains by correlating confident mass spectrometric detection of known small molecules with the presence or absence of their established biosynthetic gene clusters. The method also linked previously unassigned GCFs to known natural products, an approach that will enable de novo, bioassay-free discovery of new natural products using large data sets. Extrapolation from the 830-genome data set reveals that Actinobacteria encode hundreds of thousands of future drug leads, and the strong correlation between phylogeny and GCFs frames a roadmap to efficiently access them.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Similarity metrics for NPGC comparisons
Three similarity metrics were created for comparison of NPGCs. (a) The number of orthologous genes shared by the two clusters divided by the total number of genes in both clusters. Each gene is scored only once. (b) The total amount of each cluster involved in a PROmer alignment. (c) For the core biosynthetic domains or genes described in the Materials and Methods, corresponding domains/genes from GC_1 are found in GC_2 based on whether they are clustered together with the program uclust at clustering thresholds that increase in steps of 10%. Red indicates that A1-A4 from GC_1 are clustered together with an adenylation domain from GC_2 at a given clustering threshold. Gray indicates that there is no corresponding adenylation domain from GC_2 at a given clustering threshold. The third score used is the highest clustering threshold in which half of the domains/genes in GC_1 have a corresponding domain/gene in GC_2. The arrow indicates the maximum score for GC_1 and GC_2 of 70%, or 0.7, where half of the GC_1 A-domains are present in GC_2.
Figure 2
Figure 2. Genomic NPGC content and extrapolation
(a) A phylogenetic tree for the sampled organisms is shown surrounded by natural product gene cluster content of each genome. Blue shading indicates genomes sequenced for this project. Concentric rings, from the inside out, show counts of NRPS, type I PKS, type II PKS, NISiderophore, lanthipeptide, and TOMM gene clusters. The names of the most abundant taxonomic families are shown in the outer ring. (b) Extrapolation of the number of GCFs encoded by Actinobacteria, with 95% confidence intervals indicated as the grey area inside of dashed lines. Extrapolation was performed out to 15,000 genomes. Filled circles indicate the current extent of our sampling.
Figure 3
Figure 3. GCF conservation over genetic distance
For every pair of genomes highlighted that span the Streptomycetales to the Pseudonocardiales in Fig. 2a, the proportion of GCFs shared between them is plotted against their genetic distance. NRPS conservation plotted against (a) ribosomal protein distance, (b) rpoB gene fragment. Conservation of type I PKS clusters (c) and NRPS-independent siderophores (d) plotted against ribosomal protein distance. The density of points across both axes is shown beside all plots.
Figure 4
Figure 4. MS/GCF correlations
(a) The density distribution of the correlation scores for every GCF compound is shown for NRPS and type II PKS classes along with scores for selected known compound-gene cluster pairs. (b) Desertomycin and oasamycin compounds with the highest correlation scores (196) to a novel type I PKS gene cluster (PKS_I_18) are shown. Additional details are shown in Supplementary Fig. 6.

References

    1. Bérdy J. Bioactive microbial metabolites. J. Antibiot. 2005;58:1–26. - PubMed
    1. Bérdy J. Thoughts and facts about antibiotics: Where we are now and where we are heading. J. Antibiot. 2012;65:385–395. - PubMed
    1. Bentley SD, et al. Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2) Nature. 2002;417:141–147. - PubMed
    1. Lautru S, Deeth RJ, Bailey LM, Challis GL. Discovery of a new peptide natural product by Streptomyces coelicolor genome mining. Nat. Chem. Biol. 2005;1:265–269. - PubMed
    1. Kersten RD, et al. A mass spectrometry–guided genome mining approach for natural product peptidogenomics. Nat. Chem. Biol. 2011;7:794–802. - PMC - PubMed

Publication types

Substances