Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Dec 18;8(1):17957.
doi: 10.1038/s41598-018-36561-3.

Uncovering secondary metabolite evolution and biosynthesis using gene cluster networks and genetic dereplication

Affiliations

Uncovering secondary metabolite evolution and biosynthesis using gene cluster networks and genetic dereplication

Sebastian Theobald et al. Sci Rep. .

Abstract

The increased interest in secondary metabolites (SMs) has driven a number of genome sequencing projects to elucidate their biosynthetic pathways. As a result, studies revealed that the number of secondary metabolite gene clusters (SMGCs) greatly outnumbers detected compounds, challenging current methods to dereplicate and categorize this amount of gene clusters on a larger scale. Here, we present an automated workflow for the genetic dereplication and analysis of secondary metabolism genes in fungi. Focusing on the secondary metabolite rich genus Aspergillus, we categorize SMGCs across genomes into SMGC families using network analysis. Our method elucidates the diversity and dynamics of secondary metabolism in section Nigri, showing that SMGC diversity within the section has the same magnitude as within the genus. Using our genome analysis we were able to predict the gene cluster responsible for biosynthesis of malformin, a potentiator of anti-cancer drugs, in 18 strains. To proof the general validity of our predictions, we developed genetic engineering tools in Aspergillus brasiliensis and subsequently verified the genes for biosynthesis of malformin.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Workflow of the bioinformatic pipeline. Prior to data analysis gene annotation, InterPro and SMURF data are combined. SMGC are compared using protein BLAST of cluster members and percent identity values of alignments are aggregated to cluster similarity scores and used to create a gene cluster network. Additionally, known gene clusters from the MIBiG database are annotated in the dataset by identifying an exact match. Random walk clustering is performed using the cluster walktrap function of igraph on the network to obtain families of SMGC. To identify candidate SMGC for metabolites of interest, lists of metabolite producing organisms are compared to lists of organisms containing SMGCs of the same family. Candidate SMGC families are filtered by interpro annotations and e.g. NRPS size.
Figure 2
Figure 2
Heatmap of shared SMGC families and gene clusters linked to compounds. This heatmap contains information on phylogeny of used strains, shared SMGC families and metabolite-linked gene clusters based on MIBiG entries. The row dendrogram represents a whole genome phylogeny. The column dendrogram was generated by creating a distance matrix of shared SMGC families by organisms and running hirarchical clustering with euclidean distance (part of the heatmap.2 function). (a) Relative amounts of shared SMGC families between species in percent. Here, the presence of SMGC families resulting from our pipeline was compared through all species. Percentage is indicated as color gradient in bins of 10% from grey cells (0–10%, not present in dataset) to red cells (90–100%) as shown by the color key. Additionally, a histogram indicates the abundance of different amounts of shared SMGC families, hence, how many comparisons result in low or high similarity respectively. Species self-comparison always results in values of 100%. The column dendrogram represents a hierarchical clustering of organisms by shared SMGC percent, hence strains clustering together will share a high amount of SMGCs. (b) Identification of compound-linked gene clusters based on MIBiG entries. Best hits for MIBiG entries, were identified inside families using protein BLAST (red dot). Aculinic acid and emodin gene clusters were confirmed by sequence identifier. Using a guilt-by-association approach, the whole family of gene clusters is considered to be responsible for the production of a similar metabolite. The heatmap column dendrogram is clustered hierarchically based on presence of compound-linked gene clusters. Dereplicated gene clusters that do not show related gene clusters in other species were removed. 4,4′-piperazine-2,5-diyldimethyl-bis-phenol is abbreviated as piparazine*.
Figure 3
Figure 3
Predicted SMGC family for malformin producing gene clusters. InterPro annotations are indicated by color. The predicted SMGC family contains gene clusters with an NRPS gene of at least 12,000 bp. SMURF predicted gene clusters are shown in full; the predicted malformin gene clusters are highlighted. Tailoring genes code for enzymes like major facilitator superfamily and transcription factors as well as enzymes involved in disulphide bond formation.
Figure 4
Figure 4
Classification of condensation domains inside the predicted NRPS responsible for malformin synthesis. (a) Approximate maximum likelihood phylogeny of condensation domain amino acid sequences. Sequences of condensation domains with known activities from fungisporin (FG), fumiquinazolines (FQ), fumitremorgin (FR) and penicillin (PE) were used to infer activities of condensation domains in the predicted malformin (MA) producing NRPS. The tree was generated from 60% of conserved aligned columns and bootstrapped 1,000 times. Bootstrap values over 70 are shown next to their node. The analysis shows distinct clusters corresponding to functions of condensation domains supported by high bootstrap values. (b) Schematic for used NRPS proteins. Condensation domains are highlighted according to their function as depicted in the legend (NA: not available). Adenylation and pcp domains are represented by white cells.
Figure 5
Figure 5
Extracted Ion Chromatograms (EIC) for malformin overexpressing (mlfAΔ, mlfA-Oex) and malformin knock-out (mlfAΔ) strains. (a and b) show MS spectra of detected adducts [M+H]+, [M+NH4]+ and [M+Na]+ for the peaks displayed in (c) showing merged EICs of the six adducts (±0.005 Da) in the reference strain (akuA Δ::AFLpyrG), mlfAΔ, mlfA-Oex (mlfAΔ IS1::PgdpA-mlfA) and mlfA deletion strain (mlfAΔ). (a) reveals the peak at RT 8.9 min contains calc. m/z 516.2310, 533.2582, 538.2131, corresponding to adducts of low-mass malformins, e.g. A2 (d). The two peaks at RT 9.4–9.7 min contain the adducts of high-mass malformins, calc. m/z 530.2465, 547.2734, 552.2298 (b), where the largest peak at RT 9.7 min represents malformin C as determined by comparison to a reference standard of malformin C (d). The small peak at RT 9.4 min denotes another of the high-mass malformin (e.g. malformin A1, B1, B3, B4). In (c) the vertical axis displaying MS counts is not shown, however the intensity of the tallest peak is approximately 2 × 106.

Similar articles

Cited by

References

    1. Nielsen KF, Mogensen JM, Johansen M, Larsen TO, Frisvad JC. Review of secondary metabolites and mycotoxins from the Aspergillus niger group. Analytical and Bioanalytical Chemistry. 2009;395:1225–1242. doi: 10.1007/s00216-009-3081-5. - DOI - PubMed
    1. Martínez-Núñez MA, et al. Nonribosomal peptides synthetases and their applications in industry. Sustainable Chemical Processes. 2016;4:13. doi: 10.1186/s40508-016-0057-6. - DOI
    1. Arnison PG, et al. Ribosomally synthesized and post-translationally modified peptide natural products: Overview and recommendations for a universal nomenclature. Natural Product Reports. 2013;30:108–160. doi: 10.1039/c2np20085f. - DOI - PMC - PubMed
    1. Nagano N, et al. Class of cyclic ribosomal peptide synthetic genes in filamentous fungi. Fungal Genetics and Biology. 2016;86:58–70. doi: 10.1016/j.fgb.2015.12.010. - DOI - PubMed
    1. Finking R, Marahiel Ma. Biosynthesis of nonribosomal peptides. Annual review of microbiology. 2004;58:453–88. doi: 10.1146/annurev.micro.58.030603.123615. - DOI - PubMed

Publication types