Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 30;295(44):14826-14839.
doi: 10.1074/jbc.RA120.013528. Epub 2020 Aug 21.

Global analysis of adenylate-forming enzymes reveals β-lactone biosynthesis pathway in pathogenic Nocardia

Affiliations

Global analysis of adenylate-forming enzymes reveals β-lactone biosynthesis pathway in pathogenic Nocardia

Serina L Robinson et al. J Biol Chem. .

Abstract

Enzymes that cleave ATP to activate carboxylic acids play essential roles in primary and secondary metabolism in all domains of life. Class I adenylate-forming enzymes share a conserved structural fold but act on a wide range of substrates to catalyze reactions involved in bioluminescence, nonribosomal peptide biosynthesis, fatty acid activation, and β-lactone formation. Despite their metabolic importance, the substrates and functions of the vast majority of adenylate-forming enzymes are unknown without tools available to accurately predict them. Given the crucial roles of adenylate-forming enzymes in biosynthesis, this also severely limits our ability to predict natural product structures from biosynthetic gene clusters. Here we used machine learning to predict adenylate-forming enzyme function and substrate specificity from protein sequences. We built a web-based predictive tool and used it to comprehensively map the biochemical diversity of adenylate-forming enzymes across >50,000 candidate biosynthetic gene clusters in bacterial, fungal, and plant genomes. Ancestral phylogenetic reconstruction and sequence similarity networking of enzymes from these clusters suggested divergent evolution of the adenylate-forming superfamily from a core enzyme scaffold most related to contemporary CoA ligases toward more specialized functions including β-lactone synthetases. Our classifier predicted β-lactone synthetases in uncharacterized biosynthetic gene clusters conserved in >90 different strains of Nocardia. To test our prediction, we purified a candidate β-lactone synthetase from Nocardia brasiliensis and reconstituted the biosynthetic pathway in vitro to link the gene cluster to the β-lactone natural product, nocardiolactone. We anticipate that our machine learning approach will aid in functional classification of enzymes and advance natural product discovery.

Keywords: Nocardia; acetyl-CoA synthetase; adenylate-forming enzymes; bioinformatics; coenzyme A (CoA); enzyme catalysis; machine learning; natural product biosynthesis; substrate specificity; β-lactone synthetases.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest—M. H. M. is a co-founder of Design Pharmaceuticals and on the scientific advisory board of Hexagon Bio.

Figures

Figure 1.
Figure 1.
Machine learning to predict substrate and function of adenylate-forming enzymes from protein sequences. A, the machine learning workflow includes extracting 34 active site residues (green) and FAAL-specific loop (red) residues and encoding them as a vector of physiochemical properties. Separate classifiers are trained to predict substrate specificity and enzyme function. B, hold-out test set accuracy scores for three different classification methods evaluated in this study. C, AUROC for substrate specificity predictions. Colors correspond to different substrates and macro (gray) and micro (black) AUROC averages. Red, aryl/biaryl acids; green, bulky/phenyl aa; blue, C13–C17 fatty acids; purple, C18+ fatty acids; orange, C2–C5 acids; tan, C6–C12 fatty acids; brown, succinylbenzoic acids; hot pink, cyclic aliphatic aa; dark blue, cysteine; fuchsia, luciferin; light pink, β-hydroxy acids; turquoise, polar and charged aa; goldenrod, small hydrophilic aa; deep pink, small hydrophobic aa; lavender, tiny aa. D, confusion matrix of predicted versus truth for ANL substrate specificity on hold-out test set. Predictions for functional class are presented in Fig. S2.
Figure 2.
Figure 2.
Maximum-likelihood phylogenetic tree of characterized protein sequences in the ANL superfamily. Tree was computed using the Jones–Taylor–Thornton matrix–based model of amino acid substitution and colored by functional enzyme class. Some enzyme classes such as BLS, NRPS, and LUC form monophyletic clades, whereas other sequences, i.e. the ARYL class are dispersed throughout the tree, suggest evolutionary divergence. Red, ARYL; green, BLS; dark blue, very-long-chain acyl-CoA synthetase; orange, LUC; light blue, FAAL; brown, NRPS; purple, LACS; pink, SACS; gold, MACS. Gray node circles represent bootstrap support >75% at branch points. Bar, 0.4 aa substitutions per site.
Figure 3.
Figure 3.
Predicted functional distribution of adenylation enzymes encoded in 50,064 candidate biosynthetic gene clusters. A, sequence similarity network of all standalone AMP-binding pHMM hits extracted from candidate biosynthetic gene clusters identified in >24,000 bacterial, fungal, and plant genomes. Diamonds correspond to training set sequences, and circles represent AMP-binding hits extracted from biosynthetic gene clusters. The network was trimmed to a BLAST e-value threshold of 1 × 10−36. Circles with a probability >0.6 are colored by their prediction, whereas sequences colored gray had “no confident prediction.” B, bar plot of relative counts for different functional classes of ANL enzymes within biosynthetic gene clusters (AdenylPred prediction probability > 0.6). VLACS, very-long-chain acyl-CoA synthetase.
Figure 4.
Figure 4.
Proposed nocardiolactone biosynthetic gene cluster. A, synteny between published bacterial cis-olefin and lipstatin gene clusters with the proposed nocardiolactone biosynthetic gene cluster. Percentages correspond to amino acid identity. B, representatives of the proposed nocardiolactone biosynthetic cluster in Nocardia. Maximum-likelihood phylogenetic tree is based on NltC amino acid sequence distance estimated using the Jones–Taylor–Thornton model of amino acid substitution. Sequences corresponding to Nocardia isolated from humans are designated by black circles.
Figure 5.
Figure 5.
Biochemical characterization of nocardiolactone biosynthetic enzymes. A, time-course analysis of NltC activity with di-alkyl β-hydroxy acids with carbon backbones of length C20 (blue) and C14 (orange) compared with a no-substrate control (gray). NltC prefers longer chain β-hydroxy acids (C20) and shows no discernable activity with C14 β-hydroxy acids above the level of the no-substrate control. B, co-expressed NltA and NltB enzymes condense 2 myristoyl-CoAs to form 2-myristoyl-3-ketomyristic acid. The resulting ketone (14-heptacosanone) from the breakdown of 2-myristoyl-3-ketomyristic acid was observable by GC-MS. The enzymatic product of NltAB was identical to a 14-heptacosanone control produced by WT X. campestris OleA but was not observed to be catalyzed by NltA or NltB enzymes purified individually. C, proposed biosynthetic pathway for nocardiolactone. R1, C18H37; R2, C13H27.

Similar articles

Cited by

References

    1. D'Ambrosio H. K., and Derbyshire E. R. (2020) Investigating the role of class I adenylate-forming enzymes in natural product biosynthesis. ACS Chem. Biol. 15, 17–27 10.1021/acschembio.9b00865 - DOI - PubMed
    1. Lipmann F. (1944) Enzymatic synthesis of acetyl phosphate. J. Biol. Chem. 155, 55–70
    1. Gulick A. M. (2009) Conformational dynamics in the acyl-CoA synthetases, adenylation domains of non-ribosomal peptide synthetases, and firefly luciferase. ACS Chem. Biol. 4, 811–827 10.1021/cb900156h - DOI - PMC - PubMed
    1. Wang N., Rudolf J. D., Dong L. B., Osipiuk J., Hatzos-Skintges C., Endres M., Chang C. Y., Babnigg G., Joachimiak A., Phillips G. N., and Shen B. (2018) Natural separation of the acyl-CoA ligase reaction results in a non-adenylating enzyme. Nat. Chem. Biol. 14, 730–737 10.1038/s41589-018-0061-0 - DOI - PMC - PubMed
    1. Bera A. K., Atanasova V., Gamage S., Robinson H., and Parsons J. F. (2010) Structure of the d-alanylgriseoluteic acid biosynthetic protein EhpF, an atypical member of the ANL superfamily of adenylating enzymes. Acta Crystallogr. D Biol. Crystallogr. 66, 664–672 10.1107/S0907444910008425 - DOI - PMC - PubMed

Publication types

Supplementary concepts

LinkOut - more resources