. 2022 Apr 11;18(4):e1009999.

doi: 10.1371/journal.pcbi.1009999. eCollection 2022 Apr.

Validation-based model selection for 13C metabolic flux analysis with uncertain measurement errors

Nicolas Sundqvist¹, Nina Grankvist^{2

3

4}, Jeramie Watrous⁵, Jain Mohit⁵, Roland Nilsson^{2

3

4}, Gunnar Cedersund^{1

6}

Affiliations

¹ Linköping's University, Department of Biomedical engineering, Linköping, Sweden.
² Cardiovascular Medicine Unit, Department of Medicine (Solna), Karolinska Institutet, Stockholm, Sweden.
³ Division of Cardiovascular Medicine, Karolinska University Hospital, Stockholm, Sweden.
⁴ Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden.
⁵ Department of Medicine & Pharmacology, University of California, San Diego, California, United States of America.
⁶ Center for Medical Image Science and Visualization, Linköping University, Linköping, Sweden.

PMID: 35404953
PMCID: PMC9022838
DOI: 10.1371/journal.pcbi.1009999

Validation-based model selection for 13C metabolic flux analysis with uncertain measurement errors

Nicolas Sundqvist et al. PLoS Comput Biol. 2022.

. 2022 Apr 11;18(4):e1009999.

doi: 10.1371/journal.pcbi.1009999. eCollection 2022 Apr.

Authors

Nicolas Sundqvist¹, Nina Grankvist^{2

3

4}, Jeramie Watrous⁵, Jain Mohit⁵, Roland Nilsson^{2

3

4}, Gunnar Cedersund^{1

6}

Affiliations

¹ Linköping's University, Department of Biomedical engineering, Linköping, Sweden.
² Cardiovascular Medicine Unit, Department of Medicine (Solna), Karolinska Institutet, Stockholm, Sweden.
³ Division of Cardiovascular Medicine, Karolinska University Hospital, Stockholm, Sweden.
⁴ Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden.
⁵ Department of Medicine & Pharmacology, University of California, San Diego, California, United States of America.
⁶ Center for Medical Image Science and Visualization, Linköping University, Linköping, Sweden.

PMID: 35404953
PMCID: PMC9022838
DOI: 10.1371/journal.pcbi.1009999

Abstract

Accurate measurements of metabolic fluxes in living cells are central to metabolism research and metabolic engineering. The gold standard method is model-based metabolic flux analysis (MFA), where fluxes are estimated indirectly from mass isotopomer data with the use of a mathematical model of the metabolic network. A critical step in MFA is model selection: choosing what compartments, metabolites, and reactions to include in the metabolic network model. Model selection is often done informally during the modelling process, based on the same data that is used for model fitting (estimation data). This can lead to either overly complex models (overfitting) or too simple ones (underfitting), in both cases resulting in poor flux estimates. Here, we propose a method for model selection based on independent validation data. We demonstrate in simulation studies that this method consistently chooses the correct model in a way that is independent on errors in measurement uncertainty. This independence is beneficial, since estimating the true magnitude of these errors can be difficult. In contrast, commonly used model selection methods based on the χ2-test choose different model structures depending on the believed measurement uncertainty; this can lead to errors in flux estimates, especially when the magnitude of the error is substantially off. We present a new approach for quantification of prediction uncertainty of mass isotopomer distributions in other labelling experiments, to check for problems with too much or too little novelty in the validation data. Finally, in an isotope tracing study on human mammary epithelial cells, the validation-based model selection method identified pyruvate carboxylase as a key model component. Our results argue that validation-based model selection should be an integral part of MFA model development.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. The basic steps in ¹³C MFA and the model selection problem.**
(A) New substrates, containing ¹³C (dark circles) are fed to the cells. (B) These substrates are consumed and converted to end products in the cells, according to its biochemical reactions. (C) The labelled ¹³C molecules appear to various proportions in each of the mass isotopomers, and these proportions are summed up in these distribution bar charts for each detected metabolite. (D) The iterative modelling cycle in which a hypothesized model structure is fitted to MID data. The model fit is evaluated, usually with a χ²-test, and either rejected or not. If the model structure is rejected it is revised and evaluated again. If the model structure is not rejected it is used for flux determination. (E) The iterative model development in (D) results in a model selection problem. Different approaches for solving this model selection problem might result in different model structures being selected. This paper evaluates how the uncertainty in measurement data affects uncertainty in model selection.

**Fig 2. Example of MID sample standard deviation (A) Example of estimated mass isotopomer distribution (MID) of citrate from epithelial cells, as described in section 2.5.**
M+i indicate the fractional abundance of the i:th mass isotopomer. (B) Difference between the assumed magnitude of the standard deviations and the measured magnitudes.

**Fig 3**
Example of how model selection is affected by σ_b, for the polynomial model. Error bars indicate data sampled from a 7^th order polynomial y = h₇(x, u₀)+ϵ where ϵ is N(0, σ_r), σ_r = 0.2. Colours indicate estimation data D^est (blue) and validation data D^val (red) used by the “Validation” method. Solid curves in (A–B) indicate polynomials chosen by an estimation-based method with different “believed” standard deviation σ_b. (A) σ_b = 2, chosen model h₁. (B) σ_b = 0.2 (the true value), chosen model h₇ (the correct model). (C) σ_b = 0.02, chosen model h₁₄.

**Fig 4. Model selection results for the polynomial model example.**
(A–D) Heatmaps represent results from the indicated selection methods, where rows represent different values of σ_b and columns represent the polynomial models h₁,…,h₁₄. For each row, color indicates the fraction of times a model is selected for the given σ_b, out of 10,000 samples, as indicated by the color scale (right).

**Fig 5. Six different model structures for the linear model.**
This example is chosen as a simple representation of a mass flow model. The top row shows the model names A₁,…,A₆. The second row shows the matrices that constitute the model structures. The third row constitute visual illustrations of how the corresponding matrices connect the inputs x_i and the outputs y_i via the parameters a₁,…,a₆.

**Fig 6. Model selection results for the linear model example.**
(A–D) Heatmaps represent results from the indicated selection methods, where rows represent different values of σ_b and columns represent the linear models A₁,…,A₆. For each row, color indicates the fraction of times a model is selected for the given σ_b, out of 1000 samples, as indicated by the color scale (right).

**Fig 7. Seven different model structures included in the simulated EMU ¹³C MFA example with simulated data.**
The added component to each model structure, compared to the previous model, with slightly smaller complexity, is found inside the red circle. The true model used to simulate the data is model nr 4. Detailed descriptions for each model can be found in the supplementary material (S1 Table).

**Fig 8. Model selection results for the simulated ¹³C MFA model example.**
(A–D) Heatmaps represent results from the indicated selection methods, where rows represent different values of σ_b and columns represent the MFA models $M_{1}, \dots, M_{7}$ . For each row, color indicates the fraction of times a model is selected for the given σ_b, out of 100 samples, as indicated by the color scale (right).

**Fig 9. Comparison of estimated flux solutions for the simulated ¹³C MFA example.**
The resulting flux values with 95% confidence intervals for seven of the fluxes that are overlapping between all model structures in the simulated ¹³C MFA example. The confidence intervals correspond to the estimated fluxes for model $M_{2}$ (Blue), model $M_{4}$ with all data available (Green) and model $M_{4}$ with the data split into D^est and D^val (Red). The figure illustrates the selecting the wrong model structure may result in incorrect flux estimations.

**Fig 10. How prediction uncertainty can be used to assess the novelty in the validation data.**
(A) If there is too little novelty in the validation data, differences between estimation data and validation data will typically be smaller than the prediction and measurement uncertainty. (B) If there is too much novelty in the validation data, there is no information about the corresponding MIDs, and the prediction uncertainty will be large, approaching [0,1]. (C) An ideal design of validation data is thus to have well-determined predictions that are different compared to the estimation data. To be sure that there really is new information, one should also check that the new fluxes generate linearly independent EMU basis vectors (Section 2.4).

**Fig 11. Usage of prediction uncertainty to demonstrate that the validation data has neither too little, nor too much, novelty, compared to the estimation data.**
This analysis shows the result from the simulated ¹³C MFA example (Fig 7–9). The model was trained on estimation data corresponding to three tracers: Tracer 1 = 1,2-¹³C-glutamine (dark red), Tracer 2 = 3-¹³C-pyruvate (red), and Tracer 3 = U-¹³C-glutamine (light red). The validation data (dark blue) came from usage of tracer U-¹³C-pyruvate. For the experimental data, the error bars represent standard deviation, and for the model predictions (light blue), the error bars represent model uncertainty (Section 4.4).

**Fig 12. Model selection results for the cultures epithelial cell example.**
(A–D) Heatmaps represent results from the indicated selection methods, where rows represent different values of σ_b and columns represent the MFA models $M_{1}, \dots, M_{7}$ . For each row, color indicates the fraction of times a model is selected for the given σ_b, out of 1000 samples, as indicated by the color scale (right).

**Fig 13. Validation of lipid synthesis in HMEC cultures.**
(A) Schematic of the model for lysophosphatidylcholine (LPC) 16:0 synthesis from acetate (ac). (B) Predicted MID of ac from the model selected by the “Validation” method. (C) Measured MID of glycerol-3-phosphocholine (g3pc). (D) Fitted (gray) and measured (black) MID of LPC 16:0. Mean values of biological triplicates are shown in (C, D). Error bars indicate standard deviation.

See this image and copyright information in PMC

References

1. Berg JM, Tymoczko JL, Stryer L. Biochemistry. [Internet]. 7th ed. W.H. Freeman & Company; 2010; 2010. Available from: https://login.e.bibl.liu.se/login?url = https://search.ebscohost.com/login.aspx?direct=true&AuthType=ip,uid&db=c...
1. Thiele I, Swainston N, Fleming RMT, Hoppe A, Sahoo S, Aurich MK, et al.. A community-driven global reconstruction of human metabolism. Nature Biotechnology. 2013. Mar 3;31(5):419–25. doi: 10.1038/nbt.2488 - DOI - PMC - PubMed
1. Sinclair L V, Rolf J, Emslie E, Shi Y-B, Taylor PM, Cantrell DA. Control of amino-acid transport by antigen receptors coordinates the metabolic reprogramming essential for T cell differentiation. Nature Immunology. 2013. May 24;14(5):500–8. doi: 10.1038/ni.2556 - DOI - PMC - PubMed
1. Anderson RM, Weindruch R. Metabolic reprogramming, caloric restriction and aging. Trends in Endocrinology & Metabolism. 2010. Mar 1;21(3):134–41. doi: 10.1016/j.tem.2009.11.005 - DOI - PMC - PubMed
1. Hanahan D, Weinberg RA. Hallmarks of Cancer: The Next Generation. Cell. 2011. Mar 4;144(5):646–74. doi: 10.1016/j.cell.2011.02.013 - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Validation-based model selection for 13C metabolic flux analysis with uncertain measurement errors

Affiliations

Validation-based model selection for 13C metabolic flux analysis with uncertain measurement errors

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources