Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr 10;39(4):msac057.
doi: 10.1093/molbev/msac057.

Phylogeny-Aware Chemoinformatic Analysis of Chemical Diversity in Lamiaceae Enables Iridoid Pathway Assembly and Discovery of Aucubin Synthase

Affiliations

Phylogeny-Aware Chemoinformatic Analysis of Chemical Diversity in Lamiaceae Enables Iridoid Pathway Assembly and Discovery of Aucubin Synthase

Carlos E Rodríguez-López et al. Mol Biol Evol. .

Abstract

Countless reports describe the isolation and structural characterization of natural products, yet this information remains disconnected and underutilized. Using a cheminformatics approach, we leverage the reported observations of iridoid glucosides with the known phylogeny of a large iridoid producing plant family (Lamiaceae) to generate a set of biosynthetic pathways that best explain the extant iridoid chemical diversity. We developed a pathway reconstruction algorithm that connects iridoid reports via reactions and prunes this solution space by considering phylogenetic relationships between genera. We formulate a model that emulates the evolution of iridoid glucosides to create a synthetic data set, used to select the parameters that would best reconstruct the pathways, and apply them to the iridoid data set to generate pathway hypotheses. These computationally generated pathways were then used as the basis by which to select and screen biosynthetic enzyme candidates. Our model was successfully applied to discover a cytochrome P450 enzyme from Callicarpa americana that catalyzes the oxidation of bartsioside to aucubin, predicted by our model despite neither molecule having been observed in the genus. We also demonstrate aucubin synthase activity in orthologues of Vitex agnus-castus, and the outgroup Paulownia tomentosa, further strengthening the hypothesis, enabled by our model, that the reaction was present in the ancestral biosynthetic pathway. This is the first systematic hypothesis on the epi-iridoid glucosides biosynthesis in 25 years and sets the stage for streamlined work on the iridoid pathway. This work highlights how curation and computational analysis of widely available structural data can facilitate hypothesis-based gene discovery.

Keywords: chemical diversity; cheminformatics; comparative biochemistry; cytochrome P450; iridoids; pathway reconstruction.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Reported chemical diversity of the Lamiaceae. (a) Maximum likelihood tree of 19 selected genera from Lamiaceae and two Lamiales outgroups, as inferred by the Mint Evolutionary Genomics Consortium (2018), pruned from genera with no reports of iridoid glucosides, along with unique iridoid scaffold reports (after removing decorations) for each genus. Branches, labels, and bars are color-coded by clade, from top to bottom: Lamioideae in Persian green, Ajugoideae in tussock yellow, Scutellaroideae in pomegranate red, Premnoideae in plum violet, Viticoideae in copper rose, Nepetoideae in sapphire blue, Prostantheroideae in vermilion, Callicarpoideae in crail red, and the outgroups (Paulowniaceae and Orobanchaceae) in dark gray. (b) Example of chemotaxonomically important iridoid glucosides in four selected clades.
Fig. 2.
Fig. 2.
Mathematical representation of iridoid glucosides. (a) Individual carbons are assigned a value according to their oxidation state. (b) Each carbon has a number assigned, and thus, each iridoid scaffold can be represented in a coordinate system. (c) Example of a codified molecule, a catalpol scaffold, shown in both graphical (top) and vectorial (bottom) forms. Note that carbon C11 has a value of 3 since it is known that iridoid oxidase directly oxidizes C11 to an aldehyde without an alcohol intermediary, and thus, the aldehyde has a value of 1, carboxylic acid has a value of 2, and decarboxylation has a value of 3.
Fig. 3.
Fig. 3.
Overview of the strategy. (a) Metabolism in an ancestral organism, through evolution, expands, and differentiates into specialized pathways in different genera. In the case of iridoids, the sequence of biochemical steps of these pathways is unknown (dotted box). (b) Through natural product isolation and characterization, an incomplete subset of these molecules has been reported (color circles) in different genera. (c) Our objective, through Algorithms S1 and S2, Supplementary Material online (explained in Section 1), is to accurately estimate these pathways represented in panel (a) using reported (color) and predicted molecules (white circles). For that, we need five parameters for Algorithms S1 and S2, Supplementary Material online that must be calculated. We cannot optimize these parameters because we cannot evaluate performance: even in the few characterized pathways, typically in just one species, it is virtually impossible to determine if an enzyme or metabolite is truly absent or just has not been reported. Thus, we simulate these processes in silico. (d) We take an initial pathway in the iridoid chemical space and apply the evolutionary processes that we hypothesize explain the emergence of iridoid chemical diversity (outlined in Section 2). Algorithm S3, Supplementary Material online then outputs a computed pathway for each genus (solid box). (e) We take a random sample of the molecules in these calculated pathways. (f) We then apply Algorithms S1 and S2, Supplementary Material online with a wide range of parameters to reconstruct the pathways. We can compare these results to the original pathways in (d) and choose the parameters that offer the best reconstruction (as we do in Section 4). In Section 5, we apply these parameters to the original data set in (b) to get the pathway hypotheses that best estimate the original, unknown pathways in (a). Finally, in Section 6, we use these hypotheses to facilitate enzyme discovery.
Fig. 4.
Fig. 4.
Iridoid scaffold chemical diversity. (a) Heatmap showing the clustering by the chemical distance of iridoids (left) and their reports in the selected genera (right). In solid colors, reported metabolites for the corresponding genus are shown; in semitransparent colors, predicted metabolites, reported in Lamiaceae, are shown; in pink, theoretical metabolites, not reported in Lamiaceae, are shown. The color code corresponds to the clade each genus belongs to, as specified in Fig. 1. (b) Numbering of the carbons in the scaffold, corresponding to the columns of the heatmap. (c) Example of how catalpol looks like a row in the heatmap; the arrow points at its position. Supplementary figs. S6–S24, Supplementary Material online show the chemical representations and predicted biosynthetic pathways.
Fig. 5.
Fig. 5.
Iridoid pathway hypothesis for Phlomis spp. (Lamioideae). The metabolites and reactions expected to be present in the ancestral pathway are shown in black; metabolites reported in this genus, but not expected to be ancestral, are shown in Persian green; and completely theoretical metabolites are shown in pink. Metabolites predicted by our model, but not reported in the genus, are shown in brackets.
Fig. 6.
Fig. 6.
Iridoid pathway hypothesis for Lamium spp. (Lamioideae). The metabolites and reactions expected to be present in the ancestral pathway are shown in black; metabolites reported in this genus, but not expected to be ancestral, are shown in Persian green; and completely theoretical metabolites are shown in pink. Metabolites predicted by our model, but not reported in the genus, are shown in brackets.
Fig. 7.
Fig. 7.
Iridoid pathway hypothesis for Leonurus spp. (Lamioideae). The metabolites and reactions expected to be present in the ancestral pathway are shown in black; metabolites reported in this genus, but not expected to be ancestral, are shown in Persian green; and completely theoretical metabolites are shown in pink. Metabolites predicted by our model, but not reported in the genus, are shown in brackets.
Fig. 8.
Fig. 8.
Iridoid pathway hypothesis for Ajuga spp. (Ajugoideae). The metabolites and reactions expected to be present in the ancestral pathway are shown in black; metabolites reported in this genus, but not expected to be ancestral, are shown in tussock yellow; and completely theoretical metabolites are shown in pink. Metabolites predicted by our model, but not reported in the genus, are shown in brackets.
Fig. 9.
Fig. 9.
Pathway hypotheses. (a) Ancestral pathway predicted in the current work. (b) Damtoft–Jensen hypothesis (route II) is largely contained within this prediction. Mussaenosidic acid (in square brackets) is not predicted by our algorithm but is not definitively indicated by the labeling studies (Damtoft et al. 1993; Damtoft 1994). (1) 8-epi-7-deoxyloganic acid, (2) mussaenosidic acid, (3) 10-deoxygeniposidic acid, (4) geniposidic acid, (5) bartsioside, (6) aucubin, and (7) catalpol.
Fig. 10.
Fig. 10.
Iridoid pathway hypothesis for Callicarpa spp. (Callicarpoideae). The metabolites and reactions expected to be present in the ancestral pathway are shown in black; metabolites reported in this genus, but not expected to be ancestral, are shown in crail red; and completely theoretical metabolites are shown in pink. Metabolites predicted by our model, but not reported in the genus, are shown in brackets. Highlighted in yellow, the enzyme activity we discovered (CaAS), oxidizing (5) bartsioside to (6) aucubin, none of which have been reported in Callicarpa but were predicted to be in the ancestral pathway.
Fig. 11.
Fig. 11.
XICs of microsome incubations with bartsioside. Two channels are depicted, corresponding to the most abundant adducts of bartsioside (black, [M + FA-H] = 375.1291 ± 0.05) and aucubin (red, [M + FA-H] = 391.1241 ± 0.05). Intensities are scaled to the highest intensity of the corresponding channel in all incubations. Chromatograms are shown for microsomes containing the enzyme candidates of C. americana (CAAM), V. agnus-castus (VIAG), and P. tomentosa (PATO), as well as the empty vector (EV) control. Since enzyme candidates oxidize bartsioside to aucubin, they were renamed as aucubin synthase for each species (CaAS, VaAS, and PtAS).

References

    1. Mint Evolutionary Genomics Consortium . 2018. Phylogenomic mining of the mints reveals multiple mechanisms contributing to the evolution of chemical diversity in Lamiaceae. Mol Plant 11:1084–1096. - PubMed
    1. UniProt Consortium T . 2018. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 46:2699–2699. - PMC - PubMed
    1. R Core Team . 2020. R: a language and environment for statistical computing. Vienna (Austria): R Foundation for Statistical Computing.
    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol. 215:403–410. - PubMed
    1. Andrews S. 2016. FastQC. Version 0.11.5: Babraham Bioinformatics. Available from: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Publication types