. 2015 Sep;11(9):639-48.

doi: 10.1038/nchembio.1884.

Computational approaches to natural product discovery

Marnix H Medema¹, Michael A Fischbach²

Affiliations

¹ Bioinformatics Group, Wageningen University, Wageningen, the Netherlands.
² 1] Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California, USA. [2] California Institute for Quantitative Biosciences, University of California, San Francisco, San Francisco, California, USA.

PMID: 26284671
PMCID: PMC5024737
DOI: 10.1038/nchembio.1884

Computational approaches to natural product discovery

Marnix H Medema et al. Nat Chem Biol. 2015 Sep.

. 2015 Sep;11(9):639-48.

doi: 10.1038/nchembio.1884.

Authors

Marnix H Medema¹, Michael A Fischbach²

Affiliations

¹ Bioinformatics Group, Wageningen University, Wageningen, the Netherlands.
² 1] Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California, USA. [2] California Institute for Quantitative Biosciences, University of California, San Francisco, San Francisco, California, USA.

PMID: 26284671
PMCID: PMC5024737
DOI: 10.1038/nchembio.1884

Abstract

Starting with the earliest Streptomyces genome sequences, the promise of natural product genome mining has been captivating: genomics and bioinformatics would transform compound discovery from an ad hoc pursuit to a high-throughput endeavor. Until recently, however, genome mining has advanced natural product discovery only modestly. Here, we argue that the development of algorithms to mine the continuously increasing amounts of (meta)genomic data will enable the promise of genome mining to be realized. We review computational strategies that have been developed to identify biosynthetic gene clusters in genome sequences and predict the chemical structures of their products. We then discuss networking strategies that can systematize large volumes of genetic and chemical data and connect genomic information to metabolomic and phenotypic data. Finally, we provide a vision of what natural product discovery might look like in the future, specifically considering longstanding questions in microbial ecology regarding the roles of metabolites in interspecies interactions.

PubMed Disclaimer

Conflict of interest statement

M.A.F. is on the scientific advisory boards of NGM Biopharmaceuticals and Warp Drive Bio.

Figures

**Figure 1. The role of computation in natural product discovery**
As shown in this overview schematic, which serves as an outline for the review, computational algorithms have been developed that enable or accelerate every key step in the natural product discovery pipeline: identifying BGCs from raw genomic and metagenomic sequence data, grouping BGCs into families, predicting the structure of a BGC’s small molecule product, and connecting gene cluster and molecular families using networking approaches.

**Figure 2. Strategies for identifying BGCs**
Several strategies have been designed for the genomic identification of BGCs. (a) The main high-confidence/low-novelty strategy is based on signature mining, using profile HMMs or BLAST searches to identify (combinations of) genes or protein domains that are specific for certain types of BGCs. (b) Recently, three high-novelty/low-confidence approaches have emerged that are focused on the identification of new BGC types: 1) pattern-based mining, based on the identification of genomic regions with protein domain frequencies that are generally indicative of involvement in specialized metabolism; 2) phylogenetic mining, based on the identification of functionally diverged paralogues of primary metabolic enzymes that have acquired functions in specialized metabolism during evolution; and 3) comparative genomic mining, which uses the identification of (horizontally or intra-chromosomally) transferred conserved syntenic blocks of enzyme-coding genes that belong to the accessory (pan) genome of a species to identify ‘mobile metabolic elements’ that are indicative of a role in specialized metabolism. Bullet points preceded by + and − at the bottom of the figure indicate advantages and disadvantages of a method, respectively. Tool(s) whose workflow corresponds to a column in the flowchart are listed at the bottom of each column.

**Figure 3. Big data challenges for biosynthesis**
(a) In network-based algorithms that enable small molecule structure elucidation, networks are constructed in which each node is a mass ion, and edges are drawn between mass ions that are related by a mass difference that indicates a common chemical transformation. Sub-networks represent a molecular species of interest. (b) In an alternative approach, two distinct networks – one in which nodes are molecules, and the other in which nodes are BGCs – can be co-analyzed to connect BGCs to small molecules they encode and vice versa.

**Figure for Box**
(a) Three algorithms have been developed recently to group biosynthetic gene clusters into families; see Box 1 for more details. (b) Chemical structures of 3-amino-5-hydroxybenzoic acid (AHBA) and rifamycin.

See this image and copyright information in PMC

References

1. Bentley SD, et al. Complete genome sequence of the model actinomycete Streptomyces coelicolor A3 (2) Nature. 2002;417:141–147. - PubMed
1. Ikeda H, et al. Complete genome sequence and comparative analysis of the industrial microorganism Streptomyces avermitilis. Nat Biotechnol. 2003;21:526–531. - PubMed
1. Medema MH, Breitling R, Bovenberg R, Takano E. Exploiting plug-and-play synthetic biology for drug discovery and production in microorganisms. Nat Rev Microbiol. 2011;9:131–7. - PubMed
1. Bouslimani A, Sanchez LM, Garg N, Dorrestein PC. Mass spectrometry of natural products: current, emerging and future technologies. Nat Prod Rep. 2014;31:718–29. - PMC - PubMed
1. Krug D, Müller R. Secondary metabolomics: the impact of mass spectrometry-based approaches on the discovery and characterization of microbial natural products. Nat Prod Rep. 2014;31:768–83. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Computational approaches to natural product discovery

Affiliations

Computational approaches to natural product discovery

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical