Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Mar 24;9(3):e91963.
doi: 10.1371/journal.pone.0091963. eCollection 2014.

cuTauLeaping: a GPU-powered tau-leaping stochastic simulator for massive parallel analyses of biological systems

Affiliations

cuTauLeaping: a GPU-powered tau-leaping stochastic simulator for massive parallel analyses of biological systems

Marco S Nobile et al. PLoS One. .

Abstract

Tau-leaping is a stochastic simulation algorithm that efficiently reconstructs the temporal evolution of biological systems, modeled according to the stochastic formulation of chemical kinetics. The analysis of dynamical properties of these systems in physiological and perturbed conditions usually requires the execution of a large number of simulations, leading to high computational costs. Since each simulation can be executed independently from the others, a massive parallelization of tau-leaping can bring to relevant reductions of the overall running time. The emerging field of General Purpose Graphic Processing Units (GPGPU) provides power-efficient high-performance computing at a relatively low cost. In this work we introduce cuTauLeaping, a stochastic simulator of biological systems that makes use of GPGPU computing to execute multiple parallel tau-leaping simulations, by fully exploiting the Nvidia's Fermi GPU architecture. We show how a considerable computational speedup is achieved on GPU by partitioning the execution of tau-leaping into multiple separated phases, and we describe how to avoid some implementation pitfalls related to the scarcity of memory resources on the GPU streaming multiprocessors. Our results show that cuTauLeaping largely outperforms the CPU-based tau-leaping implementation when the number of parallel simulations increases, with a break-even directly depending on the size of the biological system and on the complexity of its emergent dynamics. In particular, cuTauLeaping is exploited to investigate the probability distribution of bistable states in the Schlögl model, and to carry out a bidimensional parameter sweep analysis to study the oscillatory regimes in the Ras/cAMP/PKA pathway in S. cerevisiae.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Schematization of CUDA architecture.
Schematic representation of CUDA threads and memory hierarchy. Left side. Thread organization: a single kernel is launched from the host (the CPU) and is executed in multiple threads on the device (the GPU); threads can be organized in three-dimensional structures named blocks which can be, in turn, organized in three-dimensional grids. The dimensions of blocks and grids are explicitly defined by the programmer. Right side. Memory hierarchy: threads can access data from many different memories with different scopes; registers and local memories are private for each thread. Shared memory let threads belonging to the same block communicate, and has low access latency. All threads can access the global memory, which suffers of high latencies but is cached since the introduction of Fermi architecture. Texture and constant memories can be read from any thread and feature a cache as well. Figures are taken from the Nvidia's CUDA programming guide .
Figure 2
Figure 2. cuTauLeaping workflow.
Simplified scheme of cuTauLeaping workflow: in phase P1 each thread calculates the value formula image for the simulation step; in phase P2, the threads whose formula image is “large” perform a tau-leaping step (by executing a set of non-critical reactions and (possibly) one critical reaction); the remaining threads perform a fixed number of SSA steps (where one reaction is executed at each step) during phase P3. The phases are iterated until all threads have reached formula image, a termination criterion verified during phase P4.
Figure 3
Figure 3. Pseudocode of cuTauLeaping – host side.
Host-side pseudocode of cuTauLeaping. As a first step, the stoichiometric information of the reactions is exploited to pre-calculate the data structures needed by the algorithm; all matrices are flattened during this process. Then, once the support memory areas are allocated (e.g., the chunk of global memory where the system dynamics will be stored), the four phases of cuTauLeaping begin and are repeated until all simulations are completed.
Figure 4
Figure 4. Pseudocode of cuTauLeaping – kernel .
Device-side pseudocode of kernel formula image in cuTauLeaping, implementing the subdivision of threads according to the formula image value and the execution of a tau-leaping step. The kernel starts by loading the vectors formula image and formula image – which correspond to the current state of the system and to the values of stochastic constants, respectively – from the global memory areas that contain these data for all threads. Since these information are frequently accessed, they are immediately copied into the faster shared memory as vectors x and c, respectively. The kernel continues by verifying that the formula image value for the running thread formula image is not equal to the signal of terminated execution (i.e., formula image). Then, it calculates the propensity functions of all reactions and accumulates their values in formula image; if formula image, the remaining time instants where the dynamics of the system is sampled are set to the current state and the simulation is terminated. The kernel concludes the phase P1 by calculating a putative formula image value for the tau-leaping step: if formula image is smaller than formula image, then thread formula image is halted and formula image is set to 0, so that it will perform the SSA steps during the next phase. Otherwise, the tau-leaping algorithm is performed by executing a set of non-critical reactions and (possibly) one critical reaction and, if the simulation has overrun one of the sampling time instants, the state stored in formula image is determined by linear interpolation.
Figure 5
Figure 5. Pseudocode of cuTauLeaping – kernel .
Device-side pseudocode of kernel formula image in cuTauLeaping, implementing the execution of the SSA steps. The kernel starts by loading the vectors formula image and formula image – which correspond to the current state of the system and to the values of stochastic constants, respectively – from the global memory areas that contain these data for all threads. Since these information are frequently accessed, they are immediately copied into the faster shared memory as vectors x and c, respectively. The kernel continues by verifying that the formula image value for the running thread formula image is equal to the signal corresponding to SSA (i.e., formula image). Then, it performs a fixed number of SSA steps (100 in our default setting), where a single reaction is executed at each step, storing the system state at the sampled time instants formula image.
Figure 6
Figure 6. Pseudocode of cuTauLeaping – kernel .
Device-side pseudocode of kernel formula image in cuTauLeaping, implementing the verification of the termination of all simulations. The verification is performed by means of CUDA's hardware accelerated synchronization and counting features, which allow to count the threads of a block which satisfy a specific predicate. By exploiting CUDA's atomic functions, we accumulate the total number of threads which satisfy the predicate formula image: if it is equal to the number of threads, the execution of all parallel simulations is completed.
Figure 7
Figure 7. Schematization of the flattened representation of the stoichiometric information.
The stoichiometry of chemical reactions is generally represented by (usually sparse) matrices, corresponding to the variation of the species appearing either as reactants or products; however, both tau-leaping and SSA exploit only the non-zero values of these matrices. Each stoichiometric matrix can be pre-processed to identify its non-zero values and discard the remaining ones, thus reducing the number of reading operations required by the two stochastic algorithms. Our strategy to reduce the size of these matrices consists in flattening each matrix as a vector of triples (formula image), where formula image is the row index, formula image is the column index and formula image is the non-zero value in formula image. In our implementation, both formula image and formula image indices are 0-based and triples are stored using vectors of CUDA's formula image data types, that have the advantage of requiring a single instruction to fetch an entry. The top part of this figure shows the values appearing in the 3×4 stoichiometric matrix of reactant species of the Michaelis-Menten model (MM), which consists of 3 reactions over 4 molecular species (see Text S1). Note that only four cells of this matrix have non-zero values; the bottom part of the figure shows the corresponding formula image vector.
Figure 8
Figure 8. Comparison of the computational time of CPU tau-leaping and cuTauLeaping.
Comparison between the computational time taken by cuTauLeaping and COPASI CPU tau-leaping to execute different batches of stochastic simulations of the Michaelis-Menten (MM) model (a), the Prokaryotic Gene Network (PGN) model (b), the Schlögl model (c), and the Ras/cAMP/PKA pathway (d) (see Text S1 for models definitions). For each model, cuTauLeaping becomes more profitable than the CPU counterpart when a certain number of parallel simulations is run, with a break-even that depends on the complexity of the system: for the MM and PGN models, cuTauLeaping is more effective when around formula image parallel simulations are run, while for the Schlögl model and the Ras/cAMP/PKA pathway the break-even is around formula image simulations. Considering the speedup, the best results achieved with cuTauLeaping – with respect to COPASI – are around 583× for the MM model, 961× for the PGN model, 90× for the Schlögl model, and 25× for the model of the Ras/cAMP/PKA pathway (see also Table 1).
Figure 9
Figure 9. Frequency distribution of bistable states in the Schlögl model.
Frequency distribution of the molecular amount of molecular species formula image in the Schlögl model, calculated using a total of formula image parallel simulations executed by cuTauLeaping. (a) Plot of the frequency distribution of formula image considering formula image a.u., to detect the bistable switching behavior that takes place in the first time instants of the dynamics; a slightly higher probability to reach the low steady state can be observed, starting from the initial state of the Schlögl system (described in Text S1). (b) Plot of the frequency distribution of formula image considering formula image a.u., to investigate the stability of the two steady states of the system; the heatmap highlights the two stable states (around 100 and 600 molecules of species formula image), and shows larger stochastic fluctuations around the high steady state.
Figure 10
Figure 10. Parameter sweep analysis of the Schlögl model.
Results of a PSA-1D on the Schlögl model, in which the value of the stochastic constant formula image is varied in the interval formula image (the set of reactions and the values of all other parameters are given in Tables 4 and 5 in Text S1). Each frequency distribution is calculated according to formula image simulations executed by cuTauLeaping, measuring the amount of the molecular species formula image at the time instant formula image a.u., considering ten different values of the stochastic constant formula image within the sweep interval. The figure shows that increasing values of formula image induce a decrease (increment) in the frequency distribution of formula image concerning the low (high) steady state, with intermediate values of formula image characterized by an effective bistable behavior.
Figure 11
Figure 11. Three-dimensional parameter sweep analysis of the Schlögl model.
Results of a PSA-3D on the Schlögl model, performed by varying the stochastic constants formula image, formula image and formula image in the intervals formula image, formula image and formula image, respectively. The values of the stochastic constants were uniformly sampled in a formula image three-dimensional lattice; for each sample, we executed 256 simulations with cuTauLeaping (for a total of formula image simulations) and evaluated the frequency distribution of the amount of the molecular species formula image at the time instant formula image a.u.. This set of values was then partitioned according to the reached (low or high) stable steady state; in the plot, the red (blue) region corresponds to the parameterizations of the model which yield the high (low) steady state most frequently. The green region represents a set of conditions whereby both steady states are equally reached.
Figure 12
Figure 12. Bidimensional parameter sweep analysis of the Ras/cAMP/PKA model.
Results of a PSA-2D on the Ras/cAMP/PKA model by varying the amount of GTP in the interval formula image molecules (ranging from a reduced nutrient availability to a normal growth condition), and the amount of Cdc25 in the interval formula image molecules (ranging from the deletion to a 2-fold overexpression of this GEF proteins). The figure shows the amplitude of cAMP oscillations, evaluated as described in ; an amplitude value equal to zero corresponds to a non oscillating dynamics. (a) Plot of the results obtained by running formula image parallel simulations with cuTauLeaping; (b) plot of the results obtained by running formula image sequential simulations, performed on the CPU. The two batches of parallel and sequential simulations were executed with a comparable computational time.
Figure 13
Figure 13. Performance comparison of CPU tau-leaping and cuTauLeaping for a PSA of the Ras/cAMP/PKA model.
Running times of cuTauLeaping and COPASI CPU tau-leaping to execute a PSA-1D of the Ras/cAMP/PKA model, where the stochastic constant formula image was varied in the interval formula image and a total of formula image simulations were executed. The plot shows how the computational cost of tau-leaping running on CPU rapidly increases; this behavior can become prohibitive if several independent simulations need to be executed. On the contrary, cuTauLeaping shows a very moderate increase in the running times and outperforms the CPU implementation of tau-leaping.

Similar articles

Cited by

References

    1. Aldridge B, Burke J, Lauffenburger D, Sorger P (2006) Physicochemical modelling of cell signalling pathways. Nat Cell Biol 8: 1195–203. - PubMed
    1. Hyduke D, Palsson B (2010) Towards genome-scale signalling network reconstructions. Nat Rev Genet 11: 297–307. - PubMed
    1. Kitano H (2002) Computational systems biology. Nature 420: 206–210. - PubMed
    1. Papin J, Hunter T, Palsson B, Subramaniam S (2005) Reconstruction of cellular signalling networks and analysis of their properties. Nat Rev Mol Cell Biol 9: 99–111. - PubMed
    1. Chou I, Voit E (2009) Recent developments in parameter estimation and structure identification of biochemical and genomic systems. Math Biosci 219: 57–83. - PMC - PubMed

Publication types