. 2014 Dec 4;10(12):e1004016.

doi: 10.1371/journal.pcbi.1004016. eCollection 2014 Dec.

A systematic computational analysis of biosynthetic gene cluster evolution: lessons for engineering biosynthesis

Marnix H Medema¹, Peter Cimermancic², Andrej Sali³, Eriko Takano⁴, Michael A Fischbach²

Affiliations

¹ Department of Microbial Physiology, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands; Groningen Bioinformatics Centre, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands.
² Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California, United States of America; California Institute for Quantitative Biosciences, San Francisco, California, United States of America.
³ Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California, United States of America; California Institute for Quantitative Biosciences, San Francisco, California, United States of America; Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California, United States of America.
⁴ Manchester Institute of Biotechnology, Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom.

PMID: 25474254
PMCID: PMC4256081
DOI: 10.1371/journal.pcbi.1004016

A systematic computational analysis of biosynthetic gene cluster evolution: lessons for engineering biosynthesis

Marnix H Medema et al. PLoS Comput Biol. 2014.

. 2014 Dec 4;10(12):e1004016.

doi: 10.1371/journal.pcbi.1004016. eCollection 2014 Dec.

Authors

Marnix H Medema¹, Peter Cimermancic², Andrej Sali³, Eriko Takano⁴, Michael A Fischbach²

Affiliations

¹ Department of Microbial Physiology, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands; Groningen Bioinformatics Centre, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands.
² Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California, United States of America; California Institute for Quantitative Biosciences, San Francisco, California, United States of America.
³ Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California, United States of America; California Institute for Quantitative Biosciences, San Francisco, California, United States of America; Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California, United States of America.
⁴ Manchester Institute of Biotechnology, Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom.

PMID: 25474254
PMCID: PMC4256081
DOI: 10.1371/journal.pcbi.1004016

Erratum in

Correction: A Systematic Computational Analysis of Biosynthetic Gene Cluster Evolution: Lessons for Engineering Biosynthesis.
PLOS Computational Biology Staff. PLOS Computational Biology Staff. PLoS Comput Biol. 2016 Mar 8;12(3):e1004767. doi: 10.1371/journal.pcbi.1004767. eCollection 2016 Mar. PLoS Comput Biol. 2016. PMID: 26953826 Free PMC article. No abstract available.

Abstract

Bacterial secondary metabolites are widely used as antibiotics, anticancer drugs, insecticides and food additives. Attempts to engineer their biosynthetic gene clusters (BGCs) to produce unnatural metabolites with improved properties are often frustrated by the unpredictability and complexity of the enzymes that synthesize these molecules, suggesting that genetic changes within BGCs are limited by specific constraints. Here, by performing a systematic computational analysis of BGC evolution, we derive evidence for three findings that shed light on the ways in which, despite these constraints, nature successfully invents new molecules: 1) BGCs for complex molecules often evolve through the successive merger of smaller sub-clusters, which function as independent evolutionary entities. 2) An important subset of polyketide synthases and nonribosomal peptide synthetases evolve by concerted evolution, which generates sets of sequence-homogenized domains that may hold promise for engineering efforts since they exhibit a high degree of functional interoperability, 3) Individual BGC families evolve in distinct ways, suggesting that design strategies should take into account family-specific functional constraints. These findings suggest novel strategies for using synthetic biology to rationally engineer biosynthetic pathways.

PubMed Disclaimer

Conflict of interest statement

I have read the journal's policy and the authors of this manuscript have the following competing interests: MAF is on the scientific advisory board of Warp Drive Bio.

Figures

**Figure 1. The rapid and dynamic evolution of BGCs differs from the evolution of ribosomal gene clusters and primary metabolism.**
a, Distributions of the best matching sequence homologs with respect to organism similarity (based on 16S rRNA) for predicted BGCs and histidine operons suggest significant differences in the ways they evolve. b, Number of detected rearrangements, indels and duplications plotted against the average percent identity in the aligned gene cluster pairs from which the events were deduced for predicted BGCs (top) and ribosomal gene clusters (bottom). Ribosomal gene clusters were selected for comparison based on their relatively large sizes (∼10–15 kb) compared to primary metabolic operons; to obtain a fair comparison with BGCs, only gene clusters of sizes 5–15 kb were taken into account. Counts are based on a systematic comparison of all gene clusters in our data set that share regions of >1000 bp with >70% identity, in which events were inferred from alignments of such 1000 bp blocks. Of the 10,096 BGC pairs meeting these criteria, 1,750 had a rearrangement, 1,140 had an indel, and 135 had a duplication, each of which were far more common than the corresponding evolutionary events in gene clusters encoding the translation apparatus. Interestingly, while indels and rearrangements could be detected in ∼16% and ∼19% of BGCs of all sizes, duplications are found far more commonly in gene clusters with sizes of >40 kb (7.6%) than in gene clusters with sizes of 10–20 kb (0.3%), suggesting a possible role for duplication and divergence in the evolution of large gene clusters. c, Size distribution of inserted/deleted fragments during recent gene cluster evolution, based on the indel analysis.

**Figure 2. Complex BGC architectures evolve through new combinations of sub-clusters that are shared between multiple gene cluster types.**
a, Network of sub-clusters shared among 34 known BGCs. Nodes represent BGCs, and node size indicates the number of sub-clusters present in the gene cluster that are shared with other BGCs within the network. Edges represent shared sub-clusters, coded by color. The pattern of sharing indicates that many sub-clusters are regularly transferred between BGCs of different types. In the interpretation of this analysis, it should be kept in mind that in rare cases different biosynthetic routes (and hence, different sub-clusters) exist towards the same moiety. b, A sub-network from a showing the shared sub-clusters among the BGCs for rubradirin, rifamycin, simocyclinone, everninomicin, and polyketomycin, as well as the chemical moieties encoded by the sub-clusters.

**Figure 3. Unexpected evolutionary relationships within the rapamycin family.**
a, Distinct scaffolds produced by pathways from related BGCs. The scatter plot shows the relationship between the sequence homology of a pair of BGCs (x-axis) and the structural homology of their small molecule products (y-axis), compared to rapamycin and its BGC. Each circle represents a gene cluster and its small molecule product. Meridamycin and FK520 are closely related to rapamycin, as are their BGCs. While the pladienolide BGC is closely related to the rapamycin BGC, the structure of pladienolide itself is not very similar to that of rapamycin. In particular, pladienolide has a much smaller macrocycle and lacks shikimate- or pipecolate-derived moieties, and, as a result, binds to a distinct protein target. Structural similarity is estimated by the Tanimoto coefficient using linear-path fingerprints (FP2) from Open Babel , while sequence homology is represented as the Jaccard index defined on pairs of Pfam domains that share sequence identities within the top 10^th percentile of all-pair sequence identities. The number of domain pairs that share sequence identities within the top 10^th percentile and sequence identity of all domain pairs are shown as point sizes and colors, respectively. b, The role of concerted evolution in homogenizing domains within a BGC. Phylogenetic trees of KS and AT domains from the rapamycin, FK520, meridamycin, and pladienolide BGCs are shown (for detailed trees with accession numbers and bootstrap values, see **Figure S11**). The KS and AT sequences largely cluster into BGC-specific clades; for the AT domains, this is even the case for two different clusters encoding the same compound (meridamycin), showing the ability of concerted evolution to homogenize domains within a BGC. c, Chemical structures of rapamycin, meridamycin, FK520 and pladienolide. The sub-structure shared among rapamycin, meridamycin and FK520 is colored red, and the domains responsible for the biosynthesis of this sub-structure in each molecule are indicated with red circles in b.

**Figure 4. Qualitative model for the evolution of NRPS/PKS domains.**
After modules are duplicated, they may get ‘trapped’ in a cycle in which small sequence divergences are counterbalanced by internal recombinations that drive concerted evolution. Through strong diversifying selection (or sufficient drift), domains may break out of this cycle towards domain sequences that are protected from concerted evolution by functional divergence and subsequent stabilizing selection on the new function, or by reduced internal recombination rates due to larger sequence differences between the domains. The abovementioned sequence divergence may occur through cumulative mutation or through recombination with other gene clusters (or other modules within the same gene cluster).

**Figure 5. Diverse and distinct modes of evolution for PKS and NRPS BGCs.**
a, Scatter plot showing the first two principal components resulting from a PCA analysis of different evolutionary characteristics of BGCs encoding different classes of NRPs and PKs. The first two principal components describe 63% of the variance. BGCs encoding members of the same family (e.g., lipopeptides, glycopeptides or macrolides) tend to cluster together, suggesting that their family members evolve in similar ways, while different families cluster apart from each other, suggesting distinct modes of evolution. Colors indicate distinct classes of BGCs. b, Scatter plot showing two features of BGCs – internal similarity index and vertical evolution index – that, of the 25 measured features, underlie most of the variation. The internal similarity index indicates how similar domains in a BGC are to other domains within the same BGC. The vertical evolution index indicates how closely related a BGC is to the BGCs harboring the closest relatives of its constituent domains (see Methods for more details). Colors indicate distinct classes of BGCs, as in panel a. **c–f**, Domain architecture plots of PKSs and NRPSs show distinct modes of evolution: c, Internal duplication with concerted evolution; d, N-terminal additions by module duplication and recombination; e, domain swapping with other BGCs; and f, mixed evolution. Geometric shapes indicate domain types (see legend); domain colors indicate the internal homology p-value of each domain to its closest relative within the same gene cluster, within the total distribution of all similarities between domains of the same type in the entire data set: hence, domains colored red are most similar, while domains colored blue are most dissimilar.

See this image and copyright information in PMC

References

1. Osbourn A (2010) Secondary metabolic gene clusters: evolutionary toolkits for chemical innovation. Trends Genet 26: 449–457. - PubMed
1. Nett M, Ikeda H, Moore BS (2009) Genomic basis for natural product biosynthetic diversity in the actinomycetes. Nat Prod Rep 26: 1362–1384. - PMC - PubMed
1. Sherman DH (2005) The Lego-ization of polyketide biosynthesis. Nat Biotechnol 23: 1083–1084. - PubMed
1. Menzella HG, Reid R, Carney JR, Chandran SS, Reisinger SJ, et al. (2005) Combinatorial polyketide biosynthesis by de novo design and rearrangement of modular polyketide synthase genes. Nat Biotechnol 23: 1171–1176. - PubMed
1. Nguyen KT, Ritz D, Gu J-Q, Alexander D, Chu M, et al. (2006) Combinatorial biosynthesis of novel antibiotics related to daptomycin. Proc Natl Acad Sci U S A 103: 17462–17467. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A systematic computational analysis of biosynthetic gene cluster evolution: lessons for engineering biosynthesis

Affiliations

A systematic computational analysis of biosynthetic gene cluster evolution: lessons for engineering biosynthesis

Authors

Affiliations

Erratum in

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources