Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov 15:10:e00107.
doi: 10.1016/j.mec.2019.e00107. eCollection 2020 Jun.

Genomic and proteomic biases inform metabolic engineering strategies for anaerobic fungi

Affiliations

Genomic and proteomic biases inform metabolic engineering strategies for anaerobic fungi

St Elmo Wilken et al. Metab Eng Commun. .

Erratum in

Abstract

Anaerobic fungi (Neocallimastigomycota) are emerging non-model hosts for biotechnology due to their wealth of biomass-degrading enzymes, yet tools to engineer these fungi have not yet been established. Here, we show that the anaerobic gut fungi have the most GC depleted genomes among 443 sequenced organisms in the fungal kingdom, which has ramifications for heterologous expression of genes as well as for emerging CRISPR-based genome engineering approaches. Comparative genomic analyses suggest that anaerobic fungi may contain cellular machinery to aid in sexual reproduction, yet a complete mating pathway was not identified. Predicted proteomes of the anaerobic fungi also contain an unusually large fraction of proteins with homopolymeric amino acid runs consisting of five or more identical consecutive amino acids. In particular, threonine runs are especially enriched in anaerobic fungal carbohydrate active enzymes (CAZymes) and this, together with a high abundance of predicted N-glycosylation motifs, suggests that gut fungal CAZymes are heavily glycosylated, which may impact heterologous production of these biotechnologically useful enzymes. Finally, we present a codon optimization strategy to aid in the development of genetic engineering tools tailored to these early-branching anaerobic fungi.

Keywords: Amino acid distribution; Anaerobe; Codon optimization; Fungi; Genome sequencing; Neocallimastigomycota.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Neocallimastigomycota are characterized by extremely GC-depleted genomes and proteomes. GC content within the predicted proteome of 443 fungal genomes is plotted as a function of fungal clade, and varies significantly across the fungal kingdom. The number of species analyzed per clade is indicated in parentheses on the x-axis. The box-and-whisker plots show outliers as points, minima and maxima as whiskers, and the inter-quartile ranges inside the boxes.
Fig. 2
Fig. 2
GC-depleted fungal proteomes are enriched in lysine, isoleucine and asparagine. Average predicted amino acid abundance per clade, ordered in decreasing GC content, is shown across the fungal kingdom. GC-rich fungal phyla are enriched in alanine, glycine, arginine, proline and valine. Asparagine is particularly enriched in Neocallimastigomycota, similar to P. falciparum, another extremely GC depleted organism.
Fig. 3
Fig. 3
Proteins with asparagine runs constitute an unusually large fraction of the Neocallimastigomycota proteome. Average amino acid run (five or more of the same amino acid consecutively in a protein) fraction per clade, ordered in decreasing GC content, in the fungal kingdom. Hydrophobic (valine, leucine, methionine and isoleucine) and bulky (phenylalanine, tyrosine and tryptophan) amino acids are noticeably absent in runs, while smaller (alanine) uncharged, polar (serine, threonine, proline, glutamine) amino acids are frequently found in runs.
Fig. 4
Fig. 4
Neocallimastigomycota have significantly more CAZymes with amino acid repeatrunsthan other fungal clades. Here the distribution of the fraction of CAZymes with runs relative to all the CAZymes in each fungus, grouped by clade, is plotted. The number in parentheses is the number of fungi included in each clade. Statistically significant differences in the distributions between Neocallimastigomycota and all the other clades are indicated by * using the two sample Kolmogorov-Smirnoff test (P ​< ​0.05). The distribution of the fraction of CAZymes with runs in each clade is shown in the blue violin plots overlaid by orange box-and-whisker plots where outliers are shown as points, minima and maxima as whiskers, and the inter-quartile ranges inside the boxes. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
Fig. 5
Fig. 5
(A) CAZymes in Neocallimastigomycota have more N-glycosylation motifs relative to other industrially important cellulolytic fungi. The fraction of CAZymes with a specified number of N-glycosylation motifs (N-X-S/T where X is not proline) on the x-axis in N. californiae, A. robustus and P. finnis (grouped as Neocallimastigomycota here, the other members are not shown due to their lower quality genomes), T. reesei, and A. niger. Linker regions are defined as the inter-domain regions of proteins. Neocallimastigomycota has a higher proportion of CAZymes with 2 or more N-glycosyaltion motifs than either T. reesei or A. niger. (B) Threonine is disproportionately abundant in the linker region of CAZymes in Neocallimastigomycota, suggesting O-glycosylation sites may be abundant. Amino acid fraction in all proteins with at least one CAZyme domain divided into three groups: domains, linker regions of proteins with runs (five or more of the same amino acid consecutively in a protein), and linker regions of proteins without runs. Linker regions are defined as the inter-domain regions. Serines, and especially threonines, are highly enriched in the inter-domain regions of CAZymes with runs and without runs.

References

    1. Albà M.M., Tompa P., Veitia R.A. Gene and Protein Evolution. KARGER; Basel: 2007. Amino acid repeats and the structure and evolution of proteins; pp. 119–130. - PubMed
    1. Arazoe T., Ogawa T., Miyoshi K., Yamato T., Ohsato S., Sakuma T., Yamamoto T., Arie T., Kuwata S. Tailor-made TALEN system for highly efficient targeted gene replacement in the rice blast fungus. Biotechnol. Bioeng. 2015;112:1335–1342. - PubMed
    1. Atasoglu C., Wallace R.J. De novo synthesis of amino acids by the ruminal anaerobic fungi, Piromyces communis and Neocallimastix frontalis. FEMS Microbiol. Lett. 2002;212:243–247. - PubMed
    1. Bach A., Calsamiglia S., Stern M.D. Nitrogen metabolism in the Rumen. J. Dairy Sci. 2005;88:E9–E21. - PubMed
    1. Beckham G.T., Bomble Y.J., Matthews J.F., Taylor C.B., Resch M.G., Yarbrough J.M., Decker S.R., Bu L., Zhao X., McCabe C., Wohlert J., Bergenstråhle M., Brady J.W., Adney W.S., Himmel M.E., Crowley M.F. The O-glycosylated linker from the Trichoderma reesei family 7 cellulase is a flexible, disordered protein. Biophys. J. 2010;99:3773–3781. - PMC - PubMed