Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jul 1;10(7):737.
doi: 10.15252/msb.20145243.

Minimal metabolic pathway structure is consistent with associated biomolecular interactions

Affiliations

Minimal metabolic pathway structure is consistent with associated biomolecular interactions

Aarash Bordbar et al. Mol Syst Biol. .

Abstract

Pathways are a universal paradigm for functionally describing cellular processes. Even though advances in high-throughput data generation have transformed biology, the core of our biological understanding, and hence data interpretation, is still predicated on human-defined pathways. Here, we introduce an unbiased, pathway structure for genome-scale metabolic networks defined based on principles of parsimony that do not mimic canonical human-defined textbook pathways. Instead, these minimal pathways better describe multiple independent pathway-associated biomolecular interaction datasets suggesting a functional organization for metabolism based on parsimonious use of cellular components. We use the inherent predictive capability of these pathways to experimentally discover novel transcriptional regulatory interactions in Escherichia coli metabolism for three transcription factors, effectively doubling the known regulatory roles for Nac and MntR. This study suggests an underlying and fundamental principle in the evolutionary selection of pathway structures; namely, that pathways may be minimal, independent, and segregated.

Keywords: constraint‐based modeling; genetic interactions; pathway analysis; protein‐protein interactions; transcriptional regulatory networks.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Overview of the MinSpan algorithm
  1. A metabolic network is mathematically represented as a stoichiometric matrix (S). Reactions fluxes (v) are determined assuming steady state. All potential flux states lie in the null space (N).

  2. The MinSpan algorithm determines the shortest, independent pathways of the metabolic network by decomposing the null space of the stoichiometric matrix to form the sparsest basis.

  3. A simplified model for glycolysis and TCA cycle is presented with 14 metabolites, 18 reactions, and a 4-dimensional null space. Reversible reactions are shown.

  4. The four pathways calculated by MinSpan for the simplified model are presented, two of which recapitulate glycolysis and the TCA cycle, while the other two represent other possible metabolic pathways. The flux directions of a pathway through reversible reactions are shown as irreversible reactions.

Figure 2
Figure 2. Correlation analysis shows MinSpan pathways are biologically relevant
  1. A “Gene-protein-reaction” (GPR) associations describe the necessary genes and proteins required for the catalysis of a metabolic reaction. Pyruvate dehydrogenase in Escherichia coli is shown as an example.

  2. B We grouped genes and proteins in the GPRs for each MinSpan pathway to check consistency with datasets on pathway-associated biomolecular interactions.

  3. C–F Correlation analysis (C) of the gene and protein sets shows MinSpan pathways are biologically consistent with three different biomolecular interaction networks: (D) protein-protein interactions in S. cerevisiae (yeast two-hybrid data), (E) positive genetic interactions in S. cerevisiae (P < 0.05 and ε > 0.16 as defined by Costanzo et al2010), and (F) transcriptional regulation in E. coli. MinSpan pathways are more consistent with data-driven protein interaction, genetic interaction, and transcriptional regulatory networks than human-defined pathways (KEGG, BioCyc, and GO), a least sparse null space (MaxSpan), and randomly generated null spaces (RandSpan). Accuracy (y-axis) is determined by the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. Coverage (x-axis) is determined by the number of interactions the method made a prediction for. The dotted circle for RandSpan represents the mean plus one standard deviation of the 100 random null spaces. x- and y-axes values are in the Supplementary Information.

Figure 3
Figure 3. Global comparison of MinSpan pathways with databases of human-defined pathways
  1. The pairwise connection specificity index (CSI) was calculated for all pathway definitions (from four sources: MinSpan, KEGG, BioCyc, and Gene Ontology) for Escherichia coli and Saccharomyces cerevisiae as a measure of pathway similarity. The CSI matrix was hierarchically clustered and the database that the pathway originates from is color-coded to the left and above the heatmap to illustrate the clustering.

  2. The percentage of pathways that share a high CSI value (top 15% of interactions) between pathway databases is presented. For example, MinSpan pathways are similar to and capture roughly 88% of all pathways in KEGG for E. coli. Conversely, KEGG is much smaller and only captures 26% of MinSpan pathways.

  3. A K-nearest neighbor search was done to see how pathways classify into other databases. KEGG and BioCyc pathways have the closest resemblance with Gene Ontology being the next similar. MinSpan is significantly different than human-defined pathways. (*Significant enrichment, Significant depletion).

  4. 533 MinSpan pathways are similar to 582 traditional pathways. There is not a one-to-one mapping as similar pathways may exist in multiple human-defined databases. For E. coli¸ there are 204 unique MinSpan pathways. In S. cerevisiae, there are none as Gene Ontology captures all the pathways from the metabolic model.

Figure 4
Figure 4. The three differences between MinSpan and human-defined pathways
  1. MinSpan automates the enumeration of biologically relevant pathways.

  2. MinSpan includes all required components of a pathway to be independent. The additional pathway components not found in human-defined pathways, such as THF recycling, are often co-regulated and thus a part of a coherent pathway functioning as a “module” in a network.

  3. MinSpan decomposes complex topology into the simplest representation. For example, there is a shorter route to l-methionine production through l-threonine than from l-aspartate. Note: For MinSpan pathways, only the representative genes of the pathway are shown. 10fthf: 10-formyltetrahydrofolate; 2obut: 2-oxobutanoate; gar: glycinamide ribonucleotide; l-cysta: l-cystathionine; methf: 5,10-methenyltetrahydrofolate; mlthf: 5,10-methylenetetrahydrofolate; prpp: phosphoribosyl pyrophosphate; skm: shikimate; thf: tetrahydrofolate.

Figure 5
Figure 5. MinSpan pathways help predict transcription factor activity
  1. Constraint-based models can determine reaction activity, or flux states (v), using Monte Carlo sampling. Decomposing sampled flux states into linear weightings of MinSpan pathways (α) allows prediction of TF activity. For example, metabolic reaction fluxes are sampled under glucose minimal media and glucose minimal media + l-arginine supplementation. Typical analysis would yield a list of reactions (including vi) that are significantly changed. With MinSpan pathways, the flux distributions can be converted into significant changes in pathway activity (including αj). TFs are associated with pathways based on enrichment of regulated genes. Predicting TF activity is based on which TFs are associated with the significantly changed pathways; in this case, αj is associated with ArgR.

  2. The TF activity of 51 nutrient shifts was predicted and can be hierarchically clustered by nutrient shift type. TF activity for the heatmap is defined as the percentage of differential MinSpan pathways that are associated with that TF. 36% of the TF–environment associations predicted are not known, providing numerous predictions for experimentation. Experimentally tested TF–environment associations are highlighted.

Figure 6
Figure 6. MinSpan TRN predictions suggest informative dual perturbation experiments that led to discovery of novel Escherichia coli transcriptional regulation
  1. Nac plays a role in purine metabolism and a larger role in nitrogen metabolism than previously known. Nac also regulates Lrp through gcvB.

  2. Cra regulates tnaCAB, which is also subject to Crp regulation.

  3. MntR plays a regulatory role for four genes that are heavily regulated by ArcA/Fnr/Fur/PdhR and are utilized during anaerobic conditions.

References

    1. Amador-Noquez D, Feng XJ, Fan J, Roquet N, Rabitz H, Rabinowitz JD. Systems-level metabolic flux profiling elucidates a complete, bifurcated tricarboxylic acid cycle in Clostridium acetobutylicum. J Bacteriol. 2010;192:4452–4461. - PMC - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology The Gene Ontology Consortium. Nat Genet. 2000;25:25–29. - PMC - PubMed
    1. Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, Mori H. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol. 2006;2:2006.0008. - PMC - PubMed
    1. Bass JIF, Diallo A, Nelson J, Soto JM, Myers CL, Walhout AJM. Using networks to measure similarity between genes: association index selection. Nat Methods. 2013;10:1169–1176. - PMC - PubMed
    1. Bordbar A, Mo ML, Nakayasu ES, Schrimpe-Rutledge AC, Kim YM, Metz TO, Jones MB, Frank BC, Smith RD, Peterson SN, Hyduke DR, Adkins JN, Palsson BO. Model-driven multi-omic data analysis elucidates metabolic immunomodulators of macrophage activation. Mol Syst Biol. 2012;8:558. - PMC - PubMed

Publication types

Substances

Associated data