Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jun 1;27(11):1537-45.
doi: 10.1093/bioinformatics/btr177. Epub 2011 Apr 8.

Prediction of metabolic reactions based on atomic and molecular properties of small-molecule compounds

Affiliations

Prediction of metabolic reactions based on atomic and molecular properties of small-molecule compounds

Fangping Mu et al. Bioinformatics. .

Abstract

Motivation: Our knowledge of the metabolites in cells and their reactions is far from complete as revealed by metabolomic measurements that detect many more small molecules than are documented in metabolic databases. Here, we develop an approach for predicting the reactivity of small-molecule metabolites in enzyme-catalyzed reactions that combines expert knowledge, computational chemistry and machine learning.

Results: We classified 4843 reactions documented in the KEGG database, from all six Enzyme Commission classes (EC 1-6), into 80 reaction classes, each of which is marked by a characteristic functional group transformation. Reaction centers and surrounding local structures in substrates and products of these reactions were represented using SMARTS. We found that each of the SMARTS-defined chemical substructures is widely distributed among metabolites, but only a fraction of the functional groups in these substructures are reactive. Using atomic properties of atoms in a putative reaction center and molecular properties as features, we trained support vector machine (SVM) classifiers to discriminate between functional groups that are reactive and non-reactive. Classifier accuracy was assessed by cross-validation analysis. A typical sensitivity [TP/(TP+FN)] or specificity [TN/(TN+FP)] is ≈0.8. Our results suggest that metabolic reactivity of small-molecule compounds can be predicted with reasonable accuracy based on the presence of a potentially reactive functional group and the chemical features of its local environment.

Availability: The classifiers presented here can be used to predict reactions via a web site (http://cellsignaling.lanl.gov/Reactivity/). The web site is freely available.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Examples of reaction classes and reaction center patterns (in bold). (A) Dehydrogenation of a secondary alcohol, which is an EC 1 reaction (listed as Reaction class 2 in Supplementary Material S1). Reactions in this class have the form A → B; the majority of reaction classes have this form. The reaction center in the substrate (i.e. the reactant on the left-hand side of the reaction), a secondary alcohol, is a hydroxyl group, and the reaction center in the product (i.e. the reactant on the right-hand side of the reaction) is a ketone group (as shown) or acetal group. Reaction centers in substrates are matched by the following SMARTS pattern: [CX4H;!$(C([OX2H])[O,S,#15])][OX2H]. Reaction centers in products are matched by either [CX3](=[OX1])([#6,#7])[#6] or [CX4;R](C)(C)([Oh])OC. We define a total of 80 reaction classes and 170 reaction center patterns, 82 for ‘substrate’ reaction centers (e.g. CH-OH) and 88 for ‘product’ reaction centers (e.g. C = O). (B) Linear carboxylic ester hydrolyzation, which is an EC 3 reaction (listed as Reaction class 48 in Supplementary Material S1). Reactions in this class have the form A → B + C. A total of nine reaction classes have this form (Reaction classes 15, 33, 48, 52, 58, 64, 65, 68 and 69). Two reaction classes have the form A + B → C (Reaction classes 77 and 78).
Fig. 2.
Fig. 2.
Illustration of the classifier corresponding to the reaction center pattern that identifies potentially reactive hydroxyl groups in substrates of reactions of Reaction class 2 (Fig. 1A). Features (atomic and molecular properties) are calculated for compounds containing a substructure matched by the reaction center pattern, and the matches are labeled as either negative examples (filled circles) or positive examples (open circles) as explained in the Section 2. The classifier is simply a surface in feature space (labeled ‘separating plane’ in this figure) that divides negative and positive examples. This surface is found using standard SVM methods (Vapnik, 1998; Chang and Lin, 2001) as explained in the Section 2. There is a classifier for each of the 170 reaction center patterns.
Fig. 3.
Fig. 3.
Summary of reaction classification. (A) Number of reactions included in each of the 80 reaction classes. Reaction classes 1–21 are typically subclasses of oxidoreductase-catalyzed reactions (EC 1). We defined more reaction classes than in our earlier study of oxidoreductase-catalyzed reactions (Mu et al., 2006). For example, alcohol dehydrogenation reactions considered in our earlier work were divided into dehydrogenation reactions of primary and secondary alcohols. Classes 22–47 are typically subclasses of transferase-catalyzed reactions (EC 2). Classes 48–66 are typically subclasses of hydrolase-catalyzed reactions (EC 3). Classes 67–73 are typically subclasses of lyase-catalyzed reactions (EC 4). These subclasses were defined based on the type of chemical bonds cleaved and the type of new bonds formed. Classes 74–76 are typically subclasses of isomerase-catalyzed reactions (EC 5). Classes 77–80 are typically subclasses of ligase-catalyzed reactions (EC 6). These subclasses were defined based on the type of chemical bond formed. (B) Number of reactions included among the 80 reaction classes for each major EC class. EX is the number of reactions documented in KEGG but not included in our analysis because they involve exotic transformations. NS is the number of reactions not included because structural information is missing in KEGG for substrates and/or products.
Fig. 4.
Fig. 4.
Summary of training data for (A) the 82 ‘substrate’ classifiers corresponding to reaction center patterns that match substructures in substrates of reaction rules and (B) the 88 ‘product’ classifiers corresponding to reaction center patterns that match substructures of products in reaction rules. The number of negative examples is usually much larger than the number of positive examples.
Fig. 5.
Fig. 5.
Sensitivity (Qp) and specificity (Qn) of (A) the 82 ‘substrate’ classifiers and (B) the 88 ‘product’ classifiers. The average sensitivity is 0.74, with a SD of 0.11. The average specificity is 0.87, with a SD of 0.08.
Fig. 6.
Fig. 6.
Illustration of how classifiers can be used to rank the reactivity of functional groups within a compound. We note that the structure of compound C16651 (KEGG ID) was not used in classifier training, i.e. we downloaded information about this compound in 2008. Using our 82 ‘substrate’ reaction center patterns, we identified 81 potential reaction centers in C16651. These reaction centers correspond to 81 possible reactions that consume C16651, 5 of which are illustrated in the figure. These reactions are instances of Reaction classes 2, 70, 9, 23 and 67 (Supplementary Material S1). For each of the 81 reactions, we calculated a raw SVM score, as described in the Section 2. Functional groups that receive a positive score are classified as reactive. Only 7 of the 81 functional groups are classified as reactive. The raw SVM scores are used to rank the 81 possible reactions, from most likely to least likely. We associate the most likely reaction with the greatest score (1.65) and the least likely reaction with the least score (-5.10). Among the 81 possible reactions, there are two reactions documented in KEGG. These reactions have KEGG IDs R08335 and R08338 and they are ranked 1 and 2. The enrichment factors for these bonda fide reactions are 81/1 (for R08335) and 81/2 (for R08338). If desired, the 88 ‘product’ reaction center patterns and corresponding classifiers can be used in a similar manner to evaluate possible reactions that produce, rather than consume, compound C16651.
Fig. 7.
Fig. 7.
Average feature importance rank across 170 classifiers. Feature indices, which are defined in Supplementary Material S5, are given along the y-axis. Average rank, which is between 1 (best possible) and 135 (worst possible), is given along the x-axis. The most important feature on average is the feature with index 24. Only the 30 highest ranked features are included in this figure. Black bars correspond to atomic properties; gray bars correspond to molecular properties. All bars are labeled to identify empirical (e) and theoretical/semiempirical (t) properties. Black bars are labeled to identify the class of atomic property (see Section 2): electrostatic (el), inductive (in), topological (to), steric (st) or distance (ge). No energetic properties appear in the top 30.

References

    1. Anari M.R., Baillie T.A. Bridging chemoinformatic metabolite predic-tion and tandem mass spectrometry. Drug Discov. Today. 2005;10:711–717. - PubMed
    1. Baran R., et al. Mass spectrometry based metabo-lomics and enzymatic assays for functional genomics. Curr. Opin. Microbiol. 2009;12:547–552. - PubMed
    1. Bhalla R., et al. Metabolomics and its role in understanding cellular repsonse in plants. Plant Cell Rep. 2005;24:562–571. - PubMed
    1. Boernke W.E., et al. Stringency of substrate specificity of Escherichia coli malate dehydrogenase. Arch. Biochem. Biophys. 1995;322:43–52. - PubMed
    1. Boobis A., et al. In silico prediction of ADME and pharmacokinetics report of an expert meeting organised by COST B15. Eur. J. Pharm. Sci. 2002;17:183–193. - PubMed

Publication types