Review

. 2024 Jan 9:24:foae011.

doi: 10.1093/femsyr/foae011.

Data integration strategies for whole-cell modeling

Katja Tummler¹, Edda Klipp¹

Affiliations

PMID: 38544322
PMCID: PMC11042497
DOI: 10.1093/femsyr/foae011

Review

Data integration strategies for whole-cell modeling

Katja Tummler et al. FEMS Yeast Res. 2024.

. 2024 Jan 9:24:foae011.

doi: 10.1093/femsyr/foae011.

Authors

Katja Tummler¹, Edda Klipp¹

Affiliation

¹ Humboldt-Universität zu Berlin, Faculty of Life Sciences, Institute of Biology, Theoretical Biophysics,, Invalidenstr. 42, 10115 Berlin, Germany.

PMID: 38544322
PMCID: PMC11042497
DOI: 10.1093/femsyr/foae011

Abstract

Data makes the world go round-and high quality data is a prerequisite for precise models, especially for whole-cell models (WCM). Data for WCM must be reusable, contain information about the exact experimental background, and should-in its entirety-cover all relevant processes in the cell. Here, we review basic requirements to data for WCM and strategies how to combine them. As a species-specific resource, we introduce the Yeast Cell Model Data Base (YCMDB) to illustrate requirements and solutions. We discuss recent standards for data as well as for computational models including the modeling process as data to be reported. We outline strategies for constructions of WCM despite their inherent complexity.

Keywords: complex networks; emerging standards; single cell; yeast cell model.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

**Figure 1.**
A WCM captures cellular maintenance and the cell division cycle, as well as cellular responses to external stresses. Here, icons within the sketched yeast cell symbolize the major processes and MET stands for metabolism, SIG for signaling, GEX for gene expression, TRP for transport, CDC for cell division cycle, and VOL for volume changes and growth. Around the cell, G1, S, M, and G2 indicate the cell cycle phases, where the cell starts as a small cell with only one copy of DNA and then grows in G1 phase. In S phase, DNA is duplicated and yeast cells start to form a bud. After another phase of growth in G2, cells organize division in mother and daughter cell during M phase.

**Figure 2.**
Overview over typical steps in mechanistic modeling, illustrating both ODE modeling (A)–(J) and Boolean modeling (K) and (L). (A) The information about the processes to be covered by the model can be given in graphical representation. The example used here can be representative both for metabolic or signaling processes: compound S₁ is produced and degraded by reactions 1 and 2 (with velocities v₁ and v₂), compounds S₂ and S₄ are converted into each other by reactions 3 and 4, compound S₃ is also produced and degraded by reactions 5 and 6. Compounds S₁ and S₃ modify (activate or inhibit) the velocities of reactions 3 and 5, respectively, without being consumed or produced themselves by these reactions. (B) The systems equations, in general, represent the temporal changes of the compounds *S_i* (denoted by the time derivative d/dt), which is given by the rates (or velocities) *v_j* combined with the stoichiometric coefficients. The necessary steps such that the system can be simulated are sketched in panels (C)–(F): (C) represents the set of systems equations for the example in (A). (D) illustrates choices for rate expressions. v₃, v₄, and v₆ follow mass action where parameters k stand for rate constants, v₂ is an example for Michaelis–Menten kinetics (with V_max maximal velocity and *K_M* Michaelis constant) and v₅ for Hill kinetics (K_0,5 is the concentration giving half maximal velocity, n is the Hill coefficient). (E) Parameter values can be either obtained from databases, estimated from experimental data (genomics, proteomics, metabolomics, and biophysical measurements) or simply guessed (as done here). Briefly, parameter estimation requires systematic repeated simulation with different parameter values and comparison with experimental data with the aim to minimize the difference between data and simulation. (F) For a simulation to start, one has to determine the initial conditions. (G)–(J) are examples for simulation experiments based on the ODE system in panels (C)–(F). (G) shows a time course simulation. (H) presents the state space for S₁ and S₂ were vectors indicate the direction of motion from different starting points. (G) and (H) show that the system moves toward a steady state. (I) A typical way to analyze the ODE system is sensitivity analysis, i.e. testing the effect of small parameter variations on the dynamics. Here, parameter V_max2 has been varied (10 simulations with different values). (J) shows the result of a stochastic simulation of the same system with the Langevin approach, where a noise term is added to each equation resulting in slightly different dynamics for each of, here, 10 simulations. (K)–(M) Boolean model of a comparable system: Component S₁ activates S₂, S₂, and S₄ can be converted into each other (thereby annihilating the other component) and S₂ activates S₃. Here, all compounds can have only two states, ON or OFF; also time proceeds in discrete steps. (K) Graph of the model. (L) Systems equations denote the state of the compound at the right side at time t + 1 as function of the state of components at the left side at time t. These changes are expressed with Boolean rules. (M) shows two simulation experiments with different initial conditions, where S₄ starts ON at t₀ in both cases and S₁ is either OFF or ON. If S₁ is OFF, the system is already at a fixed point and shows no changes in the following time steps. If S₁ is ON, the system oscillates, i.e. it has a cyclic attractor. For both ODE and Boolean modeling, it is often necessary to revise the model and repeat the modeling steps, i.e. network creation (components and reactions), assignment of rate expression or rules, and the parameter values, until the model behavior correctly reflects the experimentally observed behavior of the system.

**Figure 3.**
Overview of the contents and search functionalities of the YCMDB.

**Figure 4.**
Modular approach to WCM. (A) Modules of the Yeast Cell Model: cell division cycle (CDC), metabolism [MET, containing central carbon metabolism (CCM), cell wall synthesis (CWS), DNA synthesis (DNA), lipid metabolism (LIP), amino acid metabolism (AAM), and storage (STO)], transport [TRP, comprising ion (ION) and nutrient (NUT) transport], gene expression [GEX, with transcription (TRX), translation (TRL), assembly of protein complexes (APC), and histone activity (HIS)], volume changes [VOL, for mother (MOT) and bud (BUD)], and signaling (SIG, including the high osmolarity glycerol (HOG), pheromone (MAT), Ca-calcineurin (CAL), and TOR (TOR) pathways)]. (B) Schematic of the merging approach: parameters are estimated for module separately with other modules reduced to critical information, e.g. just the relevant volume. Then modules are merged to one large ODE model, parameters are readjusted taking now the dynamics in other modules into account. Eventually, the whole WCM is simulated as one ODE system. (C) Schematic of consolidation approach: again, parameters are estimated for each module. In the simulation process, each module is simulated separately with its own algorithm for a given time step Δt. It takes given values of link variables as input, changes them during the simulation and provides them after Δt as output. All link variables are updated based on outputs of all modules. The process is iteratively repeated until the predefined end time.

**Figure 5.**
Information flow for the creation of a YCM. (A) Literature and experimental results hold information on a multitude of biological processes and interactions. However, this data is highly condition dependent and stored in nonstandardized ways. (B) To reproducibly use the data, understand their connection, and judge their consistency and information content, several nontrivial digitalization steps are required. (C) The curated data can then be consistently and formally analyzed, e.g. in mathematical models, to foster the understanding of underlying biological processes, but also to reveal knowledge gaps. Eventually, models of whole cells could be simulated to understand the complex interwiring of cellular processes.

See this image and copyright information in PMC

References

1. Adler SO, Spiesser TW, Uschner F et al. A yeast cell cycle model integrating stress, signaling, and physiology. FEMS Yeast Res. 2022;22:foac026. - PMC - PubMed
1. Adrover MA, Zi Z, Duch A et al. Time-dependent quantitative multicomponent control of the G1-S network by the stress-activated protein kinase Hog1 upon osmostress. Sci Signal. 2011;4:ra63. - PubMed
1. Altenburg T, Goldenbogen B, Uhlendorf J et al. Osmolyte homeostasis controls single-cell growth rate and maximum cell size of Saccharomyces cerevisiae. NPJ Syst Biol Appl. 2019;5:34. - PMC - PubMed
1. Amoussouvi A, Teufel L, Reis M et al. Transcriptional timing and noise of yeast cell cycle regulators—a single cell and single molecule approach. NPJ Syst Biol Appl. 2018;4:17. - PMC - PubMed
1. Barberis M, Klipp E, Vanoni M et al. Cell size at S phase initiation: an emergent property of the G1/S network. PLoS Comput Biol. 2007;3:e64. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Data integration strategies for whole-cell modeling

Affiliation

Data integration strategies for whole-cell modeling

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials