Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov 27;11(1):6058.
doi: 10.1038/s41467-020-19986-1.

Comprehensive prediction of secondary metabolite structure and biological activity from microbial genome sequences

Affiliations

Comprehensive prediction of secondary metabolite structure and biological activity from microbial genome sequences

Michael A Skinnider et al. Nat Commun. .

Abstract

Novel antibiotics are urgently needed to address the looming global crisis of antibiotic resistance. Historically, the primary source of clinically used antibiotics has been microbial secondary metabolism. Microbial genome sequencing has revealed a plethora of uncharacterized natural antibiotics that remain to be discovered. However, the isolation of these molecules is hindered by the challenge of linking sequence information to the chemical structures of the encoded molecules. Here, we present PRISM 4, a comprehensive platform for prediction of the chemical structures of genomically encoded antibiotics, including all classes of bacterial antibiotics currently in clinical use. The accuracy of chemical structure prediction enables the development of machine-learning methods to predict the likely biological activity of encoded molecules. We apply PRISM 4 to chart secondary metabolite biosynthesis in a collection of over 10,000 bacterial genomes from both cultured isolates and metagenomic datasets, revealing thousands of encoded antibiotics. PRISM 4 is freely available as an interactive web application at http://prism.adapsyn.com .

PubMed Disclaimer

Conflict of interest statement

N.A.M. is a founder of Adapsyn Bioscience. M.A.S. and C.W.J. are or were at one time consultants to Adapsyn Bioscience. M.G., N.J.M., A.M.K., R.J.M., H.L., A.P., N.S., D.P.W., and C.A.D. are or were at one time employed by Adapsyn Bioscience. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. A comprehensive platform for genome-guided prediction of secondary metabolite chemical structures.
a Schematic overview of PRISM 4. Microbial genome sequences are annotated using a library of 1,772 HMMs, and secondary metabolite BGCs are identified using a rule-based approach. Combinatorial, graph-based chemical structure prediction is effected using a library of 618 virtual tailoring reactions. b Total number of HMMs, virtual tailoring reactions, substrates, and sugars incorporated in PRISM 4. c Examples of predicted chemical structures generated by PRISM 4 for newly added families of secondary metabolites. Source data are provided as a Source Data file.
Fig. 2
Fig. 2. PRISM 4 generates highly accurate chemical structure predictions.
a Number of BGCs within a manually curated gold standard set (n = 1,281; dotted line) identified by PRISM 4, antiSMASH 5, and NP.searcher. b Number of BGCs within the gold standard set with at least one structure predicted by each program. c Median Tanimoto coefficient between true and predicted structures for the subset of gold standard BGCs with at least one predicted structure generated by all four programs (n = 385). d Jensen–Shannon divergence between functional group content of true and predicted structures for each program. Errors bars show standard deviation of bootstrap resampling. e Median and maximum Tanimoto coefficients between true and predicted structures generated by PRISM 4 for the gold standard set, by biosynthetic family, and compared to the median Tanimoto coefficient between predicted structures and non-matched BGCs (“random pairs”). Top, statistical significance of the comparison between median and random Tanimoto coefficients (***p < 0.001; **p < 0.01; *p < 0.05, two-sided t-test). Bottom, number of BGCs from each family in the gold standard set (n). Box plots show median (horizontal line), interquartile range (hinges), and the smallest and largest values no more than 1.5 times the interquartile range (whiskers) throughout. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. PRISM 4 reveals secondary metabolite biosynthesis in 3,759 complete bacterial genomes.
a, b Number of BGCs with at least one chemical structure predicted by PRISM 4, antiSMASH 5, or both methods in a collection of 3,759 dereplicated complete bacterial genomes, by biosynthetic family (a) and phylum of producing organisms (b), as classified in the Genome Taxonomy Database (GTDB). cg Structural features of n = 4220 pairs of predicted secondary metabolites from BGCs with products predicted by both PRISM 4 and antiSMASH 5. c Percent of predicted structures in Lipinski rule of five space. Error bars show the standard error of the sample proportion. d Molecular weight of predicted structures. e Bertz topological complexity index of predicted structures. f Internal diversity of predicted structures, as quantified by median Tanimoto coefficient to all other predicted structures in the set. g Similarity of predicted structures to known natural products, as quantified by the median Tanimoto coefficient to the set of known natural products in the Natural Products Atlas. Box plots show median (horizontal line), interquartile range (hinges), and the smallest and largest values no more than 1.5 times the interquartile range (whiskers) throughout. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Quantitative predicted structure-activity relationship (QPSAR) modeling reveals thousands of genomically encoded antibiotics.
a Receiver operating characteristic (ROC) curves for support vector machine (SVM) models trained on Pfam domains found within biosynthetic gene clusters or chemical fingerprints of PRISM predicted structures. b Distribution of BGCs predicted to produce secondary metabolites with antibacterial, antitumor, immunomodulatory, antifungal, antiviral, multiple, or no biological activities in a collection of 10,121 complete or metagenome-assembled prokaryotic genomes, by biosynthetic family (left) or producing organism phylum (right), as classified in the Genome Taxonomy Database (GTDB). c, d Visualization of predicted structure chemical space by uniform manifold approximation and projection (UMAP), colored by biological activity (c) or genome origin (d). e Enrichment or depletion of secondary metabolites by predicted biological activity in metagenome-assembled genomes (MAGs), relative to complete bacterial genomes. Source data are provided as a Source Data file.

Similar articles

Cited by

References

    1. Newman DJ, Cragg GM. Natural products as sources of new drugs from 1981 to 2014. J. Nat. Prod. 2016;79:629–661. doi: 10.1021/acs.jnatprod.5b01055. - DOI - PubMed
    1. Koehn FE, Carter GT. The evolving role of natural products in drug discovery. Nat. Rev. Drug Discov. 2005;4:206–220. doi: 10.1038/nrd1657. - DOI - PubMed
    1. Crits-Christoph A, Diamond S, Butterfield CN, Thomas BC, Banfield JF. Novel soil bacteria possess diverse genes for secondary metabolite biosynthesis. Nature. 2018;558:440–444. doi: 10.1038/s41586-018-0207-y. - DOI - PubMed
    1. Doroghazi JR, et al. A roadmap for natural product discovery based on large-scale genomics and metabolomics. Nat. Chem. Biol. 2014;10:963–968. doi: 10.1038/nchembio.1659. - DOI - PMC - PubMed
    1. Cimermancic P, et al. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters. Cell. 2014;158:412–421. doi: 10.1016/j.cell.2014.06.034. - DOI - PMC - PubMed

Publication types

Substances

Grants and funding