. 2015 Nov 16;43(20):9645-62.

doi: 10.1093/nar/gkv1012. Epub 2015 Oct 5.

Genomes to natural products PRediction Informatics for Secondary Metabolomes (PRISM)

Michael A Skinnider¹, Chris A Dejong¹, Philip N Rees¹, Chad W Johnston¹, Haoxin Li¹, Andrew L H Webster¹, Morgan A Wyatt¹, Nathan A Magarvey²

Affiliations

¹ Departments of Biochemistry and Biomedical Sciences and Chemistry and Chemical Biology, Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON, L8S 4K1, Canada.
² Departments of Biochemistry and Biomedical Sciences and Chemistry and Chemical Biology, Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON, L8S 4K1, Canada magarv@mcmaster.ca.

PMID: 26442528
PMCID: PMC4787774
DOI: 10.1093/nar/gkv1012

Genomes to natural products PRediction Informatics for Secondary Metabolomes (PRISM)

Michael A Skinnider et al. Nucleic Acids Res. 2015.

. 2015 Nov 16;43(20):9645-62.

doi: 10.1093/nar/gkv1012. Epub 2015 Oct 5.

Authors

Michael A Skinnider¹, Chris A Dejong¹, Philip N Rees¹, Chad W Johnston¹, Haoxin Li¹, Andrew L H Webster¹, Morgan A Wyatt¹, Nathan A Magarvey²

Affiliations

¹ Departments of Biochemistry and Biomedical Sciences and Chemistry and Chemical Biology, Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON, L8S 4K1, Canada.
² Departments of Biochemistry and Biomedical Sciences and Chemistry and Chemical Biology, Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON, L8S 4K1, Canada magarv@mcmaster.ca.

PMID: 26442528
PMCID: PMC4787774
DOI: 10.1093/nar/gkv1012

Abstract

Microbial natural products are an invaluable source of evolved bioactive small molecules and pharmaceutical agents. Next-generation and metagenomic sequencing indicates untapped genomic potential, yet high rediscovery rates of known metabolites increasingly frustrate conventional natural product screening programs. New methods to connect biosynthetic gene clusters to novel chemical scaffolds are therefore critical to enable the targeted discovery of genetically encoded natural products. Here, we present PRISM, a computational resource for the identification of biosynthetic gene clusters, prediction of genetically encoded nonribosomal peptides and type I and II polyketides, and bio- and cheminformatic dereplication of known natural products. PRISM implements novel algorithms which render it uniquely capable of predicting type II polyketides, deoxygenated sugars, and starter units, making it a comprehensive genome-guided chemical structure prediction engine. A library of 57 tailoring reactions is leveraged for combinatorial scaffold library generation when multiple potential substrates are consistent with biosynthetic logic. We compare the accuracy of PRISM to existing genomic analysis platforms. PRISM is an open-source, user-friendly web application available at http://magarveylab.ca/prism/.

PubMed Disclaimer

Figures

**Figure 1.**
PRISM workflow for genomic prediction of secondary metabolomes. Open reading frames are analyzed with a library of hundreds of hidden Markov models and curated BLAST databases, and identified biosynthetic domains are grouped into clusters. *Trans*-acting adenylation and acyltransferase domains are accounted for as a list of biosynthetically plausible open reading frame permutations is generated. For each ordered list of open reading frames, a set of combinatorial plans is generated based on potential deoxysugar combinations, tailoring reactions and macrocyclization patterns. The resulting combinatorial scaffold library is cheminformatically dereplicated with reference to a database of 49 860 known natural products while the cluster is dereplicated using a multilocus sequence typing-style algorithm with reference to a database of 587 known biosynthetic gene clusters. Unknown natural products are identified by the process of elimination.

**Figure 2.**
Identification of biosynthetic gene clusters in PRISM. Biosynthetic domains are grouped into clusters using a greedy algorithm. Putative thiotemplated clusters are discarded if they do not contain at least one substrate-activating (A, AL, AT) domain and one bond-forming (C, KS) domain. Putative type II polyketide clusters are discarded if they do not contain a ketosynthase α (KS_α) and a chain length factor (CLF).

**Figure 3.**
Scaffold open reading frame permutation in PRISM structure prediction. When scaffold open reading frames are in the same frame but biosynthesis in this frame is implausible, or scaffold open reading frames are not in the same frame, all biosynthetically plausible permutations of scaffold open reading frames are generated.

**Figure 4.**
A recursive algorithm enables the identification of *trans*-acyltransferase modules split across two open reading frames in PRISM. Canonical *cis*-acyltransferase type I polyketide modules contain integrated acyltransferase domain for each module (left). In a hypothetical *trans*-acyltransferase polyketide cluster (right), modules lack integrated acyltransferase domains and may be split across open reading frames. A similar insertion process occurs when *trans*-acting adenylation domains are identified.

**Figure 5.**
Combinatorial scaffold library generation for a hypothetical biosynthetic gene cluster in PRISM. Biosynthetic modules are identified and the linear scaffold is constructed based on the predicted substrate of the adenylation, acyl-adenylating or acyltransferase domain within each module. Potential deoxysugar combinations and sites of attachment, tailoring reaction substrates, and macrocyclization patterns are identified and combinatorialized. In this hypothetical cluster, two deoxysugars are predicted as potential substrates of the lone glycosyltransferase. The deoxysugar in each combinatorial plan can be added at any of three potential glycosylation sites. Three potential macrolactone cyclizations are identified in addition to the linear carboxylic acid, and a single chlorination site is predicted, producing a total of 24 combinatorial plans. Execution of each combinatorial plan produces a single predicted structure. Combinatorial plans which fail to execute (e.g. when macrocyclization and glycosylation take place at the same free hydroxyl) are discarded.

**Figure 6.**
Genetic and chemical dereplication distinguishes known and unknown natural products in PRISM. Biosynthetic domains are placed in an arbitrary but consistent order and concatenated into a single artificial open reading frame, which is compared to artificial open reading frames generated for a database of 587 known clusters. Chemical graphs of predicted structures are decomposed into a one-dimensional bit set with the ECFP6 and FCFP6 algorithms and compared to a database of 49 680 known natural products.

**Figure 7.**
Comparison of PRISM, NP.searcher and antiSMASH 3.0 structure predictions across seven biosynthetic classes of natural products, as quantified by the Tanimoto coefficient. Sample sizes are as follows: beta-lactams, n = 3; cyclic/branched peptides, n = 25; glycopeptides, n = 12; lipopeptides, n = 31; macrolides, n = 28; trans-acyltransferase polyketides, n = 22; type II polyketides, n = 46. Biosynthetic gene clusters, in FASTA format, and all predicted structures, in SMILES format, are available at http://magarveylab.ca/Skinnider_etal/accuracy/. P < 0.05, **P < 0.01, ***P < 0.001 (two-tailed Student's t-test).

**Figure 8.**
Comparison of PRISM, NP.searcher and antiSMASH 3.0 structure predictions across eight microbial natural product producer phylotypes, as quantified by the Tanimoto coefficient. Sample sizes are as follows: cyanobacteria, n = 18; firmicutes, n = 25; myxobacteria, n = 19; pseudomonads, n = 17; other Gram-negative bacteria, including Xenorhabdus, Burkholderia, Vibrio and Serratia, n = 16; streptomycetes, n = 89; other actinomycetes, n = 17. Biosynthetic gene clusters, in FASTA format, and all predicted structures, in SMILES format, are available at http://magarveylab.ca/Skinnider_etal/phylo/. *P < 0.05, **P < 0.01, ***P < 0.001, n.s. not significant (two-tailed Student's t-test).

See this image and copyright information in PMC

References

1. Newman D.J., Cragg G.M. Natural products as sources of new drugs over the 30 years from 1981 to 2010. J. Nat. Prod. 2012;75:311–335. - PMC - PubMed
1. Clardy J., Walsh C. Lessons from natural molecules. Nature. 2004;432:829–837. - PubMed
1. Koehn F.E., Carter G.T. The evolving role of natural products in drug discovery. Nat. Rev. Drug Discovery. 2005;4:206–220. - PubMed
1. Nett M., Ikeda H., Moore B.S. Genomic basis for natural product biosynthetic diversity in the actinomycetes. Nat. Prod. Rep. 2009;26:1362–1384. - PMC - PubMed
1. Doroghazi J.R., Metcalf W.W. Comparative genomics of actinomycetes with a focus on natural product biosynthetic genes. BMC Genomics. 2013;14:611. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions

Grants and funding

Canadian Institutes of Health Research/Canada

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Genomes to natural products PRediction Informatics for Secondary Metabolomes (PRISM)

Affiliations

Genomes to natural products PRediction Informatics for Secondary Metabolomes (PRISM)

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources