Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec 30;36(Suppl_2):i770-i778.
doi: 10.1093/bioinformatics/btaa809.

Feasible-metabolic-pathway-exploration technique using chemical latent space

Affiliations

Feasible-metabolic-pathway-exploration technique using chemical latent space

Taiki Fuji et al. Bioinformatics. .

Abstract

Motivation: Exploring metabolic pathways is one of the key techniques for developing highly productive microbes for the bioproduction of chemical compounds. To explore feasible pathways, not only examining a combination of well-known enzymatic reactions but also finding potential enzymatic reactions that can catalyze the desired structural changes are necessary. To achieve this, most conventional techniques use manually predefined-reaction rules, however, they cannot sufficiently find potential reactions because the conventional rules cannot comprehensively express structural changes before and after enzymatic reactions. Evaluating the feasibility of the explored pathways is another challenge because there is no way to validate the reaction possibility of unknown enzymatic reactions by these rules. Therefore, a technique for comprehensively capturing the structural changes in enzymatic reactions and a technique for evaluating the pathway feasibility are still necessary to explore feasible metabolic pathways.

Results: We developed a feasible-pathway-exploration technique using chemical latent space obtained from a deep generative model for compound structures. With this technique, an enzymatic reaction is regarded as a difference vector between the main substrate and the main product in chemical latent space acquired from the generative model. Features of the enzymatic reaction are embedded into the fixed-dimensional vector, and it is possible to express structural changes of enzymatic reactions comprehensively. The technique also involves differential-evolution-based reaction selection to design feasible candidate pathways and pathway scoring using neural-network-based reaction-possibility prediction. The proposed technique was applied to the non-registered pathways relevant to the production of 2-butanone, and successfully explored feasible pathways that include such reactions.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Feasible-metabolic-pathway exploration. There are often more than one pathway for producing target compound from start compound in metabolic system. In addition to such known pathways, unknown enzyme reactions and compounds that are not registered in database (DB) may be included. Namely, there may be several feasible pathways from start compound to target compound that include both registered (gray solid lines) and non-registered (orange dotted lines) reactions
Fig. 2.
Fig. 2.
Overview of proposed technique. It involves reaction-feature-computation and pathway-exploration steps. In reaction-feature computation, variational autoencoder (VAE) models are trained with public compound DB. By using latent vectors of compounds, reaction-feature vectors are then calculated. Pathway exploration consists of pathway design and pathway scoring. Namely, several candidate pathways are designed and scored
Fig. 3.
Fig. 3.
Architecture of junction tree VAE (JT-VAE) (Jin et al., 2018). JT-VAE has two encoders, graph and tree. Input of tree encoder is junction tree decomposed using feature-tree technique (Rarey and Dixon, 1998). Color node in feature tree represents substructure of compound
Fig. 4.
Fig. 4.
Explanation of reaction-feature vector. First, latent vectors of compounds registered in metabolic-pathway DBs are acquired from JT-VAE encoders. Then, by using latent vectors of main substrate and product on basis of metabolic-pathway DBs, reaction-feature vector, which is defined as difference vector of these latent vectors, is obtained. Reaction-feature vector of EC2.7.1.1 subtracts hydroxy group and adds phosphate group to α-d-glucose
Fig. 5.
Fig. 5.
Procedure of pathway design of candidate pathways. Reaction-feature-vector sets are selected using optimization method to minimize squared error between pathway-feature vector and sum of selected reaction-feature vectors. This figure illustrates example in which three reaction-feature vectors (ra,rb and rc) are selected. There are total of six combinational orders. Intermediate compounds are reconstructed using JT-VAE decoder. Finally, unrealistic pathway(s) is removed based on molecular weight changes, and remaining candidate pathways are added to candidate-pathway list
Fig. 6.
Fig. 6.
Pathway-scoring method of candidate pathways. (a) Ensemble of neural networks (NNs) is used for predicting reaction-possibility value. Multiple NN model weights are obtained from training using each dataset. Each NN outputs 0 or 1. Reaction-possibility value vr from 0.0 to 1.0 is finally obtained using voting scheme. (b) This is example of pathway-feasibility value vp by multiplying three reaction-possibility values of reaction feature vectors (vr1,vr2 and vr3)
Fig. 7.
Fig. 7.
Confusion matrices for classification accuracies of each EC number class (digit: 2, classifier: LDA). (a) Tree & graph means that combination of tree- and graph-latent vectors of JT-VAE were used, (b) tree-latent vector was used and (c) graph-latent vector was used
Fig. 8.
Fig. 8.
Enzymatic reactions EC1.2.1.3 in the latent space
Fig. 9.
Fig. 9.
Transition in number of candidate pathways when number of repetitions was set to 2000 and subset size was changed from 100 to 1000 in steps of 100. The transitions are (A) pathway from C00631 to C00022 including two registered reactions from (B) pathway from C02233 to C02845 including one non-registered reaction, (C) pathway from C03044 to C02845 including one non-registered reaction and (D) pathway from C00810 to C02845 including one registered reaction and non-registered reaction
Fig. 10.
Fig. 10.
Results of candidate-pathway scoring. Pathway from α-d-glucose 6-phosphate (KEGG Compound ID: C00668) to glyceraldehyde 3-phosphate (KEGG Compound ID: C00118) was used. Selected enzymatic reactions were EC5.3.1.9, EC2.7.1.1 and EC4.1.2.13. There were six combinations. Each line indicating each reaction-feature vector is in different color, and thickness of line corresponds to value of each reaction possibility
Fig. 11.
Fig. 11.
Results of exploring feasible pathways. Pathway from Pyruvate (KEGG Compound ID: C00022) to 2-butanone (KEGG Compound ID: C02845) and pathway from acetyl-CoA (KEGG Compound ID: C00024) to 2-butanone are reported in Srirangan et al. (2016) and Chen et al. (2015), respectively, but both reactions from precursors to 2-butanone are not registered in KEGG. Non-registered reactions are represented as red dotted lines. Moreover, vrs are reaction-possibility values. Both reactions were explored using proposed technique

Similar articles

Cited by

References

    1. Araki M. et al. (2014) M-path: a compass for navigating potential metabolic pathways. Bioinformatics, 31, 905–911. - PubMed
    1. Battiti R., Colla A.M. (1994) Democracy in neural nets: voting schemes for classification. Neural Networks, 7, 691–707.
    1. Caspi R. et al. (2018) The MetaCyc database of metabolic pathways and enzymes. Nucleic Acids Res., 46, D633–D639. - PMC - PubMed
    1. Chen Z. et al. (2015) Metabolic engineering of klebsiella pneumoniae for the production of 2-butanone from glucose. PLoS One, 10, e0140508. - PMC - PubMed
    1. Choi K.R. et al. (2019) Systems metabolic engineering strategies: integrating systems and synthetic biology with metabolic engineering. Trends Biotechnol., 37, 817–837. - PubMed

MeSH terms