Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Sep;11(9):639-48.
doi: 10.1038/nchembio.1884.

Computational approaches to natural product discovery

Affiliations

Computational approaches to natural product discovery

Marnix H Medema et al. Nat Chem Biol. 2015 Sep.

Abstract

Starting with the earliest Streptomyces genome sequences, the promise of natural product genome mining has been captivating: genomics and bioinformatics would transform compound discovery from an ad hoc pursuit to a high-throughput endeavor. Until recently, however, genome mining has advanced natural product discovery only modestly. Here, we argue that the development of algorithms to mine the continuously increasing amounts of (meta)genomic data will enable the promise of genome mining to be realized. We review computational strategies that have been developed to identify biosynthetic gene clusters in genome sequences and predict the chemical structures of their products. We then discuss networking strategies that can systematize large volumes of genetic and chemical data and connect genomic information to metabolomic and phenotypic data. Finally, we provide a vision of what natural product discovery might look like in the future, specifically considering longstanding questions in microbial ecology regarding the roles of metabolites in interspecies interactions.

PubMed Disclaimer

Conflict of interest statement

M.A.F. is on the scientific advisory boards of NGM Biopharmaceuticals and Warp Drive Bio.

Figures

Figure 1
Figure 1. The role of computation in natural product discovery
As shown in this overview schematic, which serves as an outline for the review, computational algorithms have been developed that enable or accelerate every key step in the natural product discovery pipeline: identifying BGCs from raw genomic and metagenomic sequence data, grouping BGCs into families, predicting the structure of a BGC’s small molecule product, and connecting gene cluster and molecular families using networking approaches.
Figure 2
Figure 2. Strategies for identifying BGCs
Several strategies have been designed for the genomic identification of BGCs. (a) The main high-confidence/low-novelty strategy is based on signature mining, using profile HMMs or BLAST searches to identify (combinations of) genes or protein domains that are specific for certain types of BGCs. (b) Recently, three high-novelty/low-confidence approaches have emerged that are focused on the identification of new BGC types: 1) pattern-based mining, based on the identification of genomic regions with protein domain frequencies that are generally indicative of involvement in specialized metabolism; 2) phylogenetic mining, based on the identification of functionally diverged paralogues of primary metabolic enzymes that have acquired functions in specialized metabolism during evolution; and 3) comparative genomic mining, which uses the identification of (horizontally or intra-chromosomally) transferred conserved syntenic blocks of enzyme-coding genes that belong to the accessory (pan) genome of a species to identify ‘mobile metabolic elements’ that are indicative of a role in specialized metabolism. Bullet points preceded by + and − at the bottom of the figure indicate advantages and disadvantages of a method, respectively. Tool(s) whose workflow corresponds to a column in the flowchart are listed at the bottom of each column.
Figure 3
Figure 3. Big data challenges for biosynthesis
(a) In network-based algorithms that enable small molecule structure elucidation, networks are constructed in which each node is a mass ion, and edges are drawn between mass ions that are related by a mass difference that indicates a common chemical transformation. Sub-networks represent a molecular species of interest. (b) In an alternative approach, two distinct networks – one in which nodes are molecules, and the other in which nodes are BGCs – can be co-analyzed to connect BGCs to small molecules they encode and vice versa.
Figure for Box
Figure for Box
(a) Three algorithms have been developed recently to group biosynthetic gene clusters into families; see Box 1 for more details. (b) Chemical structures of 3-amino-5-hydroxybenzoic acid (AHBA) and rifamycin.

References

    1. Bentley SD, et al. Complete genome sequence of the model actinomycete Streptomyces coelicolor A3 (2) Nature. 2002;417:141–147. - PubMed
    1. Ikeda H, et al. Complete genome sequence and comparative analysis of the industrial microorganism Streptomyces avermitilis. Nat Biotechnol. 2003;21:526–531. - PubMed
    1. Medema MH, Breitling R, Bovenberg R, Takano E. Exploiting plug-and-play synthetic biology for drug discovery and production in microorganisms. Nat Rev Microbiol. 2011;9:131–7. - PubMed
    1. Bouslimani A, Sanchez LM, Garg N, Dorrestein PC. Mass spectrometry of natural products: current, emerging and future technologies. Nat Prod Rep. 2014;31:718–29. - PMC - PubMed
    1. Krug D, Müller R. Secondary metabolomics: the impact of mass spectrometry-based approaches on the discovery and characterization of microbial natural products. Nat Prod Rep. 2014;31:768–83. - PubMed

Publication types

MeSH terms