Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Nov 16;43(20):9645-62.
doi: 10.1093/nar/gkv1012. Epub 2015 Oct 5.

Genomes to natural products PRediction Informatics for Secondary Metabolomes (PRISM)

Affiliations

Genomes to natural products PRediction Informatics for Secondary Metabolomes (PRISM)

Michael A Skinnider et al. Nucleic Acids Res. .

Abstract

Microbial natural products are an invaluable source of evolved bioactive small molecules and pharmaceutical agents. Next-generation and metagenomic sequencing indicates untapped genomic potential, yet high rediscovery rates of known metabolites increasingly frustrate conventional natural product screening programs. New methods to connect biosynthetic gene clusters to novel chemical scaffolds are therefore critical to enable the targeted discovery of genetically encoded natural products. Here, we present PRISM, a computational resource for the identification of biosynthetic gene clusters, prediction of genetically encoded nonribosomal peptides and type I and II polyketides, and bio- and cheminformatic dereplication of known natural products. PRISM implements novel algorithms which render it uniquely capable of predicting type II polyketides, deoxygenated sugars, and starter units, making it a comprehensive genome-guided chemical structure prediction engine. A library of 57 tailoring reactions is leveraged for combinatorial scaffold library generation when multiple potential substrates are consistent with biosynthetic logic. We compare the accuracy of PRISM to existing genomic analysis platforms. PRISM is an open-source, user-friendly web application available at http://magarveylab.ca/prism/.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
PRISM workflow for genomic prediction of secondary metabolomes. Open reading frames are analyzed with a library of hundreds of hidden Markov models and curated BLAST databases, and identified biosynthetic domains are grouped into clusters. Trans-acting adenylation and acyltransferase domains are accounted for as a list of biosynthetically plausible open reading frame permutations is generated. For each ordered list of open reading frames, a set of combinatorial plans is generated based on potential deoxysugar combinations, tailoring reactions and macrocyclization patterns. The resulting combinatorial scaffold library is cheminformatically dereplicated with reference to a database of 49 860 known natural products while the cluster is dereplicated using a multilocus sequence typing-style algorithm with reference to a database of 587 known biosynthetic gene clusters. Unknown natural products are identified by the process of elimination.
Figure 2.
Figure 2.
Identification of biosynthetic gene clusters in PRISM. Biosynthetic domains are grouped into clusters using a greedy algorithm. Putative thiotemplated clusters are discarded if they do not contain at least one substrate-activating (A, AL, AT) domain and one bond-forming (C, KS) domain. Putative type II polyketide clusters are discarded if they do not contain a ketosynthase α (KSα) and a chain length factor (CLF).
Figure 3.
Figure 3.
Scaffold open reading frame permutation in PRISM structure prediction. When scaffold open reading frames are in the same frame but biosynthesis in this frame is implausible, or scaffold open reading frames are not in the same frame, all biosynthetically plausible permutations of scaffold open reading frames are generated.
Figure 4.
Figure 4.
A recursive algorithm enables the identification of trans-acyltransferase modules split across two open reading frames in PRISM. Canonical cis-acyltransferase type I polyketide modules contain integrated acyltransferase domain for each module (left). In a hypothetical trans-acyltransferase polyketide cluster (right), modules lack integrated acyltransferase domains and may be split across open reading frames. A similar insertion process occurs when trans-acting adenylation domains are identified.
Figure 5.
Figure 5.
Combinatorial scaffold library generation for a hypothetical biosynthetic gene cluster in PRISM. Biosynthetic modules are identified and the linear scaffold is constructed based on the predicted substrate of the adenylation, acyl-adenylating or acyltransferase domain within each module. Potential deoxysugar combinations and sites of attachment, tailoring reaction substrates, and macrocyclization patterns are identified and combinatorialized. In this hypothetical cluster, two deoxysugars are predicted as potential substrates of the lone glycosyltransferase. The deoxysugar in each combinatorial plan can be added at any of three potential glycosylation sites. Three potential macrolactone cyclizations are identified in addition to the linear carboxylic acid, and a single chlorination site is predicted, producing a total of 24 combinatorial plans. Execution of each combinatorial plan produces a single predicted structure. Combinatorial plans which fail to execute (e.g. when macrocyclization and glycosylation take place at the same free hydroxyl) are discarded.
Figure 6.
Figure 6.
Genetic and chemical dereplication distinguishes known and unknown natural products in PRISM. Biosynthetic domains are placed in an arbitrary but consistent order and concatenated into a single artificial open reading frame, which is compared to artificial open reading frames generated for a database of 587 known clusters. Chemical graphs of predicted structures are decomposed into a one-dimensional bit set with the ECFP6 and FCFP6 algorithms and compared to a database of 49 680 known natural products.
Figure 7.
Figure 7.
Comparison of PRISM, NP.searcher and antiSMASH 3.0 structure predictions across seven biosynthetic classes of natural products, as quantified by the Tanimoto coefficient. Sample sizes are as follows: beta-lactams, n = 3; cyclic/branched peptides, n = 25; glycopeptides, n = 12; lipopeptides, n = 31; macrolides, n = 28; trans-acyltransferase polyketides, n = 22; type II polyketides, n = 46. Biosynthetic gene clusters, in FASTA format, and all predicted structures, in SMILES format, are available at http://magarveylab.ca/Skinnider_etal/accuracy/. P < 0.05, **P < 0.01, ***P < 0.001 (two-tailed Student's t-test).
Figure 8.
Figure 8.
Comparison of PRISM, NP.searcher and antiSMASH 3.0 structure predictions across eight microbial natural product producer phylotypes, as quantified by the Tanimoto coefficient. Sample sizes are as follows: cyanobacteria, n = 18; firmicutes, n = 25; myxobacteria, n = 19; pseudomonads, n = 17; other Gram-negative bacteria, including Xenorhabdus, Burkholderia, Vibrio and Serratia, n = 16; streptomycetes, n = 89; other actinomycetes, n = 17. Biosynthetic gene clusters, in FASTA format, and all predicted structures, in SMILES format, are available at http://magarveylab.ca/Skinnider_etal/phylo/. *P < 0.05, **P < 0.01, ***P < 0.001, n.s. not significant (two-tailed Student's t-test).

References

    1. Newman D.J., Cragg G.M. Natural products as sources of new drugs over the 30 years from 1981 to 2010. J. Nat. Prod. 2012;75:311–335. - PMC - PubMed
    1. Clardy J., Walsh C. Lessons from natural molecules. Nature. 2004;432:829–837. - PubMed
    1. Koehn F.E., Carter G.T. The evolving role of natural products in drug discovery. Nat. Rev. Drug Discovery. 2005;4:206–220. - PubMed
    1. Nett M., Ikeda H., Moore B.S. Genomic basis for natural product biosynthetic diversity in the actinomycetes. Nat. Prod. Rep. 2009;26:1362–1384. - PMC - PubMed
    1. Doroghazi J.R., Metcalf W.W. Comparative genomics of actinomycetes with a focus on natural product biosynthetic genes. BMC Genomics. 2013;14:611. - PMC - PubMed

Publication types