Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Jan 9:24:foae011.
doi: 10.1093/femsyr/foae011.

Data integration strategies for whole-cell modeling

Affiliations
Review

Data integration strategies for whole-cell modeling

Katja Tummler et al. FEMS Yeast Res. .

Abstract

Data makes the world go round-and high quality data is a prerequisite for precise models, especially for whole-cell models (WCM). Data for WCM must be reusable, contain information about the exact experimental background, and should-in its entirety-cover all relevant processes in the cell. Here, we review basic requirements to data for WCM and strategies how to combine them. As a species-specific resource, we introduce the Yeast Cell Model Data Base (YCMDB) to illustrate requirements and solutions. We discuss recent standards for data as well as for computational models including the modeling process as data to be reported. We outline strategies for constructions of WCM despite their inherent complexity.

Keywords: complex networks; emerging standards; single cell; yeast cell model.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
A WCM captures cellular maintenance and the cell division cycle, as well as cellular responses to external stresses. Here, icons within the sketched yeast cell symbolize the major processes and MET stands for metabolism, SIG for signaling, GEX for gene expression, TRP for transport, CDC for cell division cycle, and VOL for volume changes and growth. Around the cell, G1, S, M, and G2 indicate the cell cycle phases, where the cell starts as a small cell with only one copy of DNA and then grows in G1 phase. In S phase, DNA is duplicated and yeast cells start to form a bud. After another phase of growth in G2, cells organize division in mother and daughter cell during M phase.
Figure 2.
Figure 2.
Overview over typical steps in mechanistic modeling, illustrating both ODE modeling (A)–(J) and Boolean modeling (K) and (L). (A) The information about the processes to be covered by the model can be given in graphical representation. The example used here can be representative both for metabolic or signaling processes: compound S1 is produced and degraded by reactions 1 and 2 (with velocities v1 and v2), compounds S2 and S4 are converted into each other by reactions 3 and 4, compound S3 is also produced and degraded by reactions 5 and 6. Compounds S1 and S3 modify (activate or inhibit) the velocities of reactions 3 and 5, respectively, without being consumed or produced themselves by these reactions. (B) The systems equations, in general, represent the temporal changes of the compounds Si (denoted by the time derivative d/dt), which is given by the rates (or velocities) vj combined with the stoichiometric coefficients. The necessary steps such that the system can be simulated are sketched in panels (C)–(F): (C) represents the set of systems equations for the example in (A). (D) illustrates choices for rate expressions. v3, v4, and v6 follow mass action where parameters k stand for rate constants, v2 is an example for Michaelis–Menten kinetics (with Vmax maximal velocity and KM Michaelis constant) and v5 for Hill kinetics (K0,5 is the concentration giving half maximal velocity, n is the Hill coefficient). (E) Parameter values can be either obtained from databases, estimated from experimental data (genomics, proteomics, metabolomics, and biophysical measurements) or simply guessed (as done here). Briefly, parameter estimation requires systematic repeated simulation with different parameter values and comparison with experimental data with the aim to minimize the difference between data and simulation. (F) For a simulation to start, one has to determine the initial conditions. (G)–(J) are examples for simulation experiments based on the ODE system in panels (C)–(F). (G) shows a time course simulation. (H) presents the state space for S1 and S2 were vectors indicate the direction of motion from different starting points. (G) and (H) show that the system moves toward a steady state. (I) A typical way to analyze the ODE system is sensitivity analysis, i.e. testing the effect of small parameter variations on the dynamics. Here, parameter Vmax2 has been varied (10 simulations with different values). (J) shows the result of a stochastic simulation of the same system with the Langevin approach, where a noise term is added to each equation resulting in slightly different dynamics for each of, here, 10 simulations. (K)–(M) Boolean model of a comparable system: Component S1 activates S2, S2, and S4 can be converted into each other (thereby annihilating the other component) and S2 activates S3. Here, all compounds can have only two states, ON or OFF; also time proceeds in discrete steps. (K) Graph of the model. (L) Systems equations denote the state of the compound at the right side at time t + 1 as function of the state of components at the left side at time t. These changes are expressed with Boolean rules. (M) shows two simulation experiments with different initial conditions, where S4 starts ON at t0 in both cases and S1 is either OFF or ON. If S1 is OFF, the system is already at a fixed point and shows no changes in the following time steps. If S1 is ON, the system oscillates, i.e. it has a cyclic attractor. For both ODE and Boolean modeling, it is often necessary to revise the model and repeat the modeling steps, i.e. network creation (components and reactions), assignment of rate expression or rules, and the parameter values, until the model behavior correctly reflects the experimentally observed behavior of the system.
Figure 3.
Figure 3.
Overview of the contents and search functionalities of the YCMDB.
Figure 4.
Figure 4.
Modular approach to WCM. (A) Modules of the Yeast Cell Model: cell division cycle (CDC), metabolism [MET, containing central carbon metabolism (CCM), cell wall synthesis (CWS), DNA synthesis (DNA), lipid metabolism (LIP), amino acid metabolism (AAM), and storage (STO)], transport [TRP, comprising ion (ION) and nutrient (NUT) transport], gene expression [GEX, with transcription (TRX), translation (TRL), assembly of protein complexes (APC), and histone activity (HIS)], volume changes [VOL, for mother (MOT) and bud (BUD)], and signaling (SIG, including the high osmolarity glycerol (HOG), pheromone (MAT), Ca-calcineurin (CAL), and TOR (TOR) pathways)]. (B) Schematic of the merging approach: parameters are estimated for module separately with other modules reduced to critical information, e.g. just the relevant volume. Then modules are merged to one large ODE model, parameters are readjusted taking now the dynamics in other modules into account. Eventually, the whole WCM is simulated as one ODE system. (C) Schematic of consolidation approach: again, parameters are estimated for each module. In the simulation process, each module is simulated separately with its own algorithm for a given time step Δt. It takes given values of link variables as input, changes them during the simulation and provides them after Δt as output. All link variables are updated based on outputs of all modules. The process is iteratively repeated until the predefined end time.
Figure 5.
Figure 5.
Information flow for the creation of a YCM. (A) Literature and experimental results hold information on a multitude of biological processes and interactions. However, this data is highly condition dependent and stored in nonstandardized ways. (B) To reproducibly use the data, understand their connection, and judge their consistency and information content, several nontrivial digitalization steps are required. (C) The curated data can then be consistently and formally analyzed, e.g. in mathematical models, to foster the understanding of underlying biological processes, but also to reveal knowledge gaps. Eventually, models of whole cells could be simulated to understand the complex interwiring of cellular processes.

References

    1. Adler SO, Spiesser TW, Uschner F et al. A yeast cell cycle model integrating stress, signaling, and physiology. FEMS Yeast Res. 2022;22:foac026. - PMC - PubMed
    1. Adrover MA, Zi Z, Duch A et al. Time-dependent quantitative multicomponent control of the G1-S network by the stress-activated protein kinase Hog1 upon osmostress. Sci Signal. 2011;4:ra63. - PubMed
    1. Altenburg T, Goldenbogen B, Uhlendorf J et al. Osmolyte homeostasis controls single-cell growth rate and maximum cell size of Saccharomyces cerevisiae. NPJ Syst Biol Appl. 2019;5:34. - PMC - PubMed
    1. Amoussouvi A, Teufel L, Reis M et al. Transcriptional timing and noise of yeast cell cycle regulators—a single cell and single molecule approach. NPJ Syst Biol Appl. 2018;4:17. - PMC - PubMed
    1. Barberis M, Klipp E, Vanoni M et al. Cell size at S phase initiation: an emergent property of the G1/S network. PLoS Comput Biol. 2007;3:e64. - PMC - PubMed

Publication types