Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jun 15;31(12):i161-70.
doi: 10.1093/bioinformatics/btv224.

Metabolome-scale de novo pathway reconstruction using regioisomer-sensitive graph alignments

Affiliations

Metabolome-scale de novo pathway reconstruction using regioisomer-sensitive graph alignments

Yoshihiro Yamanishi et al. Bioinformatics. .

Abstract

Motivation: Recent advances in mass spectrometry and related metabolomics technologies have enabled the rapid and comprehensive analysis of numerous metabolites. However, biosynthetic and biodegradation pathways are only known for a small portion of metabolites, with most metabolic pathways remaining uncharacterized.

Results: In this study, we developed a novel method for supervised de novo metabolic pathway reconstruction with an improved graph alignment-based approach in the reaction-filling framework. We proposed a novel chemical graph alignment algorithm, which we called PACHA (Pairwise Chemical Aligner), to detect the regioisomer-sensitive connectivities between the aligned substructures of two compounds. Unlike other existing graph alignment methods, PACHA can efficiently detect only one common subgraph between two compounds. Our results show that the proposed method outperforms previous descriptor-based methods or existing graph alignment-based methods in the enzymatic reaction-likeness prediction for isomer-enriched reactions. It is also useful for reaction annotation that assigns potential reaction characteristics such as EC (Enzyme Commission) numbers and PIERO (Enzymatic Reaction Ontology for Partial Information) terms to substrate-product pairs. Finally, we conducted a comprehensive enzymatic reaction-likeness prediction for all possible uncharacterized compound pairs, suggesting potential metabolic pathways for newly predicted substrate-product pairs.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
(a) Graph alignment-based vector proposed in this study. Graph alignment yields atom–atom mapping (represented by dashed lines). Subsequently, the number of atom–atom pairs in the alignment (e.g. the column labeled ‘a:C1a = C1a’ in the white boxes on the left), the number of eliminated bonds (e.g. the column labeled ‘e:C1a-C1b’ in the gray boxes in the middle) and the number of generated bonds (e.g. column labeled ‘g:C1a-N1b’ in the gray boxes on the right) were represented as a vector. The symbols ‘=’ and ‘−’ represent the atom–atom mapping and the chemical bond, respectively. (b) Descriptor-based vectors in the previous studies (e.g. KCF-S). Each compound vector represents chemical characteristics (e.g. number of substructures). The feature vector for the compound pair consists of three parts: common features between the two compounds (in the white boxes on the left), excess number of features in the left compound (in the gray boxes in the middle) and right compound (in the gray boxes on the right)
Fig. 2.
Fig. 2.
Updating a node label surrounded by double circle by aggregating with neighboring labels in the WL procedure. The WL procedure is applied to each label class of primary, secondary and tertiary labels
Fig. 3.
Fig. 3.
Examples of predicted chemical transformations grouped by isomeric compounds, with compositional formula (a) C10H16O - C10H14O, (b) C15H10O5 - C21H20O10 and (c) C20H12 - C20H12O. Vertically aligned compounds, e.g. C11938, C11415 and C11491 in (a), are regioisomers. Pairs a1, b1 and c1 are known substrate–product pairs for which the predictions were correct for KCF-S and PACHA. Pairs a2, b2 and c2 are negative examples and were predicted negative by KCF-S and positive by PACHA. Pairs a3, b3 and c3 are also negative examples and were predicted negative by PACHA and positive by KCF-S
Fig. 4.
Fig. 4.
AUC and AUPR scores for EC sub-subclasses using previous direct approach and our proposed filtering approach
Fig. 5.
Fig. 5.
AUC and AUPR scores for PIERO terms using previous direct approach and our proposed filtering approach
Fig. 6.
Fig. 6.
One of the newly predicted pathway supported by both PACHA and PACHA + KCF-S, as well as the recent KEGG release

References

    1. Afendi F., et al. (2012) KNApSAcK family databases: integrated metabolite-plant species databases for multifaceted plant research. Plant Cell Physiol., 53, e1. - PubMed
    1. Bono H., et al. (1998) Reconstruction of amino acid biosynthesis pathways from the complete genome sequence. Genome Res., 8, 203–220. - PubMed
    1. Boyer F., Viari A. (2003) Ab initio reconstruction of metabolic pathways. Bioinformatics , 19, ii26–ii34. - PubMed
    1. Chen B., et al. (2009) PubChem as a source of polypharmacology. J. Chem. Inf. Model., 49, 2044–2055. - PubMed
    1. Darvas F. (1988) Predicting metabolic pathways by logic programming. J. Mol. Graphics , 6, 80–86.

Publication types

LinkOut - more resources