Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr 11;18(4):e1009999.
doi: 10.1371/journal.pcbi.1009999. eCollection 2022 Apr.

Validation-based model selection for 13C metabolic flux analysis with uncertain measurement errors

Affiliations

Validation-based model selection for 13C metabolic flux analysis with uncertain measurement errors

Nicolas Sundqvist et al. PLoS Comput Biol. .

Abstract

Accurate measurements of metabolic fluxes in living cells are central to metabolism research and metabolic engineering. The gold standard method is model-based metabolic flux analysis (MFA), where fluxes are estimated indirectly from mass isotopomer data with the use of a mathematical model of the metabolic network. A critical step in MFA is model selection: choosing what compartments, metabolites, and reactions to include in the metabolic network model. Model selection is often done informally during the modelling process, based on the same data that is used for model fitting (estimation data). This can lead to either overly complex models (overfitting) or too simple ones (underfitting), in both cases resulting in poor flux estimates. Here, we propose a method for model selection based on independent validation data. We demonstrate in simulation studies that this method consistently chooses the correct model in a way that is independent on errors in measurement uncertainty. This independence is beneficial, since estimating the true magnitude of these errors can be difficult. In contrast, commonly used model selection methods based on the χ2-test choose different model structures depending on the believed measurement uncertainty; this can lead to errors in flux estimates, especially when the magnitude of the error is substantially off. We present a new approach for quantification of prediction uncertainty of mass isotopomer distributions in other labelling experiments, to check for problems with too much or too little novelty in the validation data. Finally, in an isotope tracing study on human mammary epithelial cells, the validation-based model selection method identified pyruvate carboxylase as a key model component. Our results argue that validation-based model selection should be an integral part of MFA model development.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The basic steps in 13C MFA and the model selection problem.
(A) New substrates, containing 13C (dark circles) are fed to the cells. (B) These substrates are consumed and converted to end products in the cells, according to its biochemical reactions. (C) The labelled 13C molecules appear to various proportions in each of the mass isotopomers, and these proportions are summed up in these distribution bar charts for each detected metabolite. (D) The iterative modelling cycle in which a hypothesized model structure is fitted to MID data. The model fit is evaluated, usually with a χ2-test, and either rejected or not. If the model structure is rejected it is revised and evaluated again. If the model structure is not rejected it is used for flux determination. (E) The iterative model development in (D) results in a model selection problem. Different approaches for solving this model selection problem might result in different model structures being selected. This paper evaluates how the uncertainty in measurement data affects uncertainty in model selection.
Fig 2
Fig 2. Example of MID sample standard deviation (A) Example of estimated mass isotopomer distribution (MID) of citrate from epithelial cells, as described in section 2.5.
M+i indicate the fractional abundance of the i:th mass isotopomer. (B) Difference between the assumed magnitude of the standard deviations and the measured magnitudes.
Fig 3
Fig 3
Example of how model selection is affected by σb, for the polynomial model. Error bars indicate data sampled from a 7th order polynomial y = h7(x, u0)+ϵ where ϵ is N(0, σr), σr = 0.2. Colours indicate estimation data Dest (blue) and validation data Dval (red) used by the “Validation” method. Solid curves in (A–B) indicate polynomials chosen by an estimation-based method with different “believed” standard deviation σb. (A) σb = 2, chosen model h1. (B) σb = 0.2 (the true value), chosen model h7 (the correct model). (C) σb = 0.02, chosen model h14.
Fig 4
Fig 4. Model selection results for the polynomial model example.
(A–D) Heatmaps represent results from the indicated selection methods, where rows represent different values of σb and columns represent the polynomial models h1,…,h14. For each row, color indicates the fraction of times a model is selected for the given σb, out of 10,000 samples, as indicated by the color scale (right).
Fig 5
Fig 5. Six different model structures for the linear model.
This example is chosen as a simple representation of a mass flow model. The top row shows the model names A1,…,A6. The second row shows the matrices that constitute the model structures. The third row constitute visual illustrations of how the corresponding matrices connect the inputs xi and the outputs yi via the parameters a1,…,a6.
Fig 6
Fig 6. Model selection results for the linear model example.
(A–D) Heatmaps represent results from the indicated selection methods, where rows represent different values of σb and columns represent the linear models A1,…,A6. For each row, color indicates the fraction of times a model is selected for the given σb, out of 1000 samples, as indicated by the color scale (right).
Fig 7
Fig 7. Seven different model structures included in the simulated EMU 13C MFA example with simulated data.
The added component to each model structure, compared to the previous model, with slightly smaller complexity, is found inside the red circle. The true model used to simulate the data is model nr 4. Detailed descriptions for each model can be found in the supplementary material (S1 Table).
Fig 8
Fig 8. Model selection results for the simulated 13C MFA model example.
(A–D) Heatmaps represent results from the indicated selection methods, where rows represent different values of σb and columns represent the MFA models M1,,M7. For each row, color indicates the fraction of times a model is selected for the given σb, out of 100 samples, as indicated by the color scale (right).
Fig 9
Fig 9. Comparison of estimated flux solutions for the simulated 13C MFA example.
The resulting flux values with 95% confidence intervals for seven of the fluxes that are overlapping between all model structures in the simulated 13C MFA example. The confidence intervals correspond to the estimated fluxes for model M2 (Blue), model M4 with all data available (Green) and model M4 with the data split into Dest and Dval (Red). The figure illustrates the selecting the wrong model structure may result in incorrect flux estimations.
Fig 10
Fig 10. How prediction uncertainty can be used to assess the novelty in the validation data.
(A) If there is too little novelty in the validation data, differences between estimation data and validation data will typically be smaller than the prediction and measurement uncertainty. (B) If there is too much novelty in the validation data, there is no information about the corresponding MIDs, and the prediction uncertainty will be large, approaching [0,1]. (C) An ideal design of validation data is thus to have well-determined predictions that are different compared to the estimation data. To be sure that there really is new information, one should also check that the new fluxes generate linearly independent EMU basis vectors (Section 2.4).
Fig 11
Fig 11. Usage of prediction uncertainty to demonstrate that the validation data has neither too little, nor too much, novelty, compared to the estimation data.
This analysis shows the result from the simulated 13C MFA example (Fig 7–9). The model was trained on estimation data corresponding to three tracers: Tracer 1 = 1,2-13C-glutamine (dark red), Tracer 2 = 3-13C-pyruvate (red), and Tracer 3 = U-13C-glutamine (light red). The validation data (dark blue) came from usage of tracer U-13C-pyruvate. For the experimental data, the error bars represent standard deviation, and for the model predictions (light blue), the error bars represent model uncertainty (Section 4.4).
Fig 12
Fig 12. Model selection results for the cultures epithelial cell example.
(A–D) Heatmaps represent results from the indicated selection methods, where rows represent different values of σb and columns represent the MFA models M1,,M7. For each row, color indicates the fraction of times a model is selected for the given σb, out of 1000 samples, as indicated by the color scale (right).
Fig 13
Fig 13. Validation of lipid synthesis in HMEC cultures.
(A) Schematic of the model for lysophosphatidylcholine (LPC) 16:0 synthesis from acetate (ac). (B) Predicted MID of ac from the model selected by the “Validation” method. (C) Measured MID of glycerol-3-phosphocholine (g3pc). (D) Fitted (gray) and measured (black) MID of LPC 16:0. Mean values of biological triplicates are shown in (C, D). Error bars indicate standard deviation.

References

    1. Berg JM, Tymoczko JL, Stryer L. Biochemistry. [Internet]. 7th ed. W.H. Freeman & Company; 2010; 2010. Available from: https://login.e.bibl.liu.se/login?url = https://search.ebscohost.com/login.aspx?direct=true&AuthType=ip,uid&db=c...
    1. Thiele I, Swainston N, Fleming RMT, Hoppe A, Sahoo S, Aurich MK, et al.. A community-driven global reconstruction of human metabolism. Nature Biotechnology. 2013. Mar 3;31(5):419–25. doi: 10.1038/nbt.2488 - DOI - PMC - PubMed
    1. Sinclair L V, Rolf J, Emslie E, Shi Y-B, Taylor PM, Cantrell DA. Control of amino-acid transport by antigen receptors coordinates the metabolic reprogramming essential for T cell differentiation. Nature Immunology. 2013. May 24;14(5):500–8. doi: 10.1038/ni.2556 - DOI - PMC - PubMed
    1. Anderson RM, Weindruch R. Metabolic reprogramming, caloric restriction and aging. Trends in Endocrinology & Metabolism. 2010. Mar 1;21(3):134–41. doi: 10.1016/j.tem.2009.11.005 - DOI - PMC - PubMed
    1. Hanahan D, Weinberg RA. Hallmarks of Cancer: The Next Generation. Cell. 2011. Mar 4;144(5):646–74. doi: 10.1016/j.cell.2011.02.013 - DOI - PubMed

Publication types