Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 11;118(19):e2020230118.
doi: 10.1073/pnas.2020230118.

An interpreted atlas of biosynthetic gene clusters from 1,000 fungal genomes

Affiliations

An interpreted atlas of biosynthetic gene clusters from 1,000 fungal genomes

Matthew T Robey et al. Proc Natl Acad Sci U S A. .

Abstract

Fungi are prolific producers of natural products, compounds which have had a large societal impact as pharmaceuticals, mycotoxins, and agrochemicals. Despite the availability of over 1,000 fungal genomes and several decades of compound discovery efforts from fungi, the biosynthetic gene clusters (BGCs) encoded by these genomes and the associated chemical space have yet to be analyzed systematically. Here, we provide detailed annotation and analyses of fungal biosynthetic and chemical space to enable genome mining and discovery of fungal natural products. Using 1,037 genomes from species across the fungal kingdom (e.g., Ascomycota, Basidiomycota, and non-Dikarya taxa), 36,399 predicted BGCs were organized into a network of 12,067 gene cluster families (GCFs). Anchoring these GCFs with reference BGCs enabled automated annotation of 2,026 BGCs with predicted metabolite scaffolds. We performed parallel analyses of the chemical repertoire of fungi, organizing 15,213 fungal compounds into 2,945 molecular families (MFs). The taxonomic landscape of fungal GCFs is largely species specific, though select families such as the equisetin GCF are present across vast phylogenetic distances with parallel diversifications in the GCF and MF. We compare these fungal datasets with a set of 5,453 bacterial genomes and their BGCs and 9,382 bacterial compounds, revealing dramatic differences between bacterial and fungal biosynthetic logic and chemical space. These genomics and cheminformatics analyses reveal the large extent to which fungal and bacterial sources represent distinct compound reservoirs. With a >10-fold increase in the number of interpreted strains and annotated BGCs, this work better regularizes the biosynthetic potential of fungi for rational compound discovery.

Keywords: biosynthesis; fungi; genome mining; natural products; secondary metabolism.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
Organizing BGCs from 1,037 fungal genomes. (A) Exploring fungal diversity using networks of GCFs and MFs. A GCF is a collection of similar BGCs aggregated into a network and predicted to use a similar chemical scaffold and create a family of related metabolites. An MF is a collection of metabolites that likewise represent chemical variations around a chemical scaffold. This networking approach enables hierarchical analysis of BGCs and their encoded metabolite scaffolds from large numbers of interpreted genomes. (B) Distribution of BGCs across the fungal kingdom. The BGC content of fungal genomes varies dramatically with phylogeny. Organisms within Pezizomycotina have more BGCs per genome and a greater diversity of biosynthetic types than organisms in Basidiomycota and non-Dikarya phyla.
Fig. 2.
Fig. 2.
The distribution of 12,067 GCFs across the fungal kingdom. (A) Heatmap of GCFs across Fungi. The phylogram to the left shows a Neighbor Joining species tree based on 290 shared orthologous genes across 1,037 genomes; horizontal shaded regions across the heatmap correspond to each labeled taxonomic group. The order of GCF columns is the result of hierarchical clustering based on the GCF presence/absence matrix. Across Fungi, the distribution of GCFs largely follows phylogenetic trends, with most GCFs confined to a specific genus or species. (B) Relationship between genetic distance and GCF content. The dotted lines indicate median genetic distance values for organisms within the same species, genus, order, class, or phylum. Each point in the scatterplot represents a pair of genomes and the fraction of the pair’s GCFs that are shared. (C) Relationship between taxonomic rank and shared GCF content across the fungal kingdom. Violin plots show the fraction of GCFs shared between all pairs of organisms within our 1,000-genome dataset, with each pair classified based on the lowest taxonomic rank shared between the two organisms.
Fig. 3.
Fig. 3.
Large-scale analysis of fungal genome-encoded and known metabolite scaffolds. (A) Colliding large-scale collections of fungal genetic content (Left) and fungal natural products (Right) using a network of GCFs interpreted from 1,037 genomes (Left) and 15,213 metabolites arranged into 2,945 molecular families based on their Tanimoto similarity score (Right). Note that 92% of these 12,067 GCFs remain unassigned to their metabolite products. (B) Variations in adenylation domain substrate-binding residues and tailoring enzyme composition facilitate modifications to the equisetin GCF (Left) and MF (Right). The phylogram to the left represents a maximum likelihood tree based on the hybrid NRPS–PKS backbone enzyme. All branches in this tree have >50% bootstrap support.
Fig. 4.
Fig. 4.
Fungal BGCs are distinct from their canonical bacterial counterparts. (A) PCA of 36,399 fungal and 24,024 bacteria BGCs, with points sized according to the number of BGCs analyzed. Fungal and bacterial taxonomic groups occupy distinct regions of this biosynthetic space. (B) Fungal and bacterial BGCs differ in backbone enzyme composition, with fungal NRPS and PKS clusters typically encoding only a single backbone, compared to multiple-backbone enzymes found in bacterial BGCs. (C) Fungal and bacterial NRPS BGCs differ dramatically in their use of termination domains for release of peptide intermediates. (D) Fungal NRPS logic is distinct from bacterial canon. Most fungal NRPS pathways involve a single NRPS enzyme that utilizes a terminal condensation domain to produce a cyclic peptide. In contrast, bacterial NRPS enzymes contain multiple NRPS enzymes that operate in a colinear fashion and typically utilize thioesterase domains to produce linear or cyclic peptides.
Fig. 5.
Fig. 5.
Bacteria and fungi are distinct sources for natural product scaffolds. (A) PCA of 24,595 known bacterial and fungal compounds, with points sized according to the number of compounds. Fungal and bacterial taxonomic groups occupy distinct regions in this representation of chemical space for natural products. (B) Quantitative comparison of structural classifications in bacterial versus fungal compounds. (C) Bacteria and fungi represent distinct pools for bioactive compounds and scaffolds. Selected chemical moieties enriched and characteristic of each taxonomic group are highlighted in yellow. The fold enrichment of the chemical moiety is indicated in green, with P values from a chi-squared test indicated.

References

    1. Bullerman L. B., Significance of mycotoxins to food safety and human health. J. Food Prot. 42, 65–86 (1979). - PubMed
    1. Bills G. F., Gloer J. B., Biologically active secondary metabolites from the fungi. Microbiol. Spectr. 4, 1087–1119 (2016). - PubMed
    1. Li Y. F., et al. ., Comprehensive curation and analysis of fungal biosynthetic gene clusters of published natural products. Fungal Genet. Biol. 89, 18–28 (2016). - PMC - PubMed
    1. Keller N. P., Fungal secondary metabolism: regulation, function and drug discovery. Nat. Rev. Microbiol. 17, 167–180 (2019). - PMC - PubMed
    1. Nguyen D. D., et al. ., MS/MS networking guided analysis of molecule and gene cluster families. Proc. Natl. Acad. Sci. U.S.A. 110, E2611–E2620 (2013). - PMC - PubMed

Publication types

Substances

LinkOut - more resources