. 2014 Mar 24;9(3):e91963.

doi: 10.1371/journal.pone.0091963. eCollection 2014.

cuTauLeaping: a GPU-powered tau-leaping stochastic simulator for massive parallel analyses of biological systems

Marco S Nobile¹, Paolo Cazzaniga², Daniela Besozzi³, Dario Pescini⁴, Giancarlo Mauri¹

Affiliations

¹ Dipartimento di Informatica, Sistemistica e Comunicazione, Università degli Studi di Milano-Bicocca, Milano, Italy; SYSBIO Centre for Systems Biology, Milano, Italy.
² Dipartimento di Scienze Umane e Sociali, Università degli Studi di Bergamo, Bergamo, Italy; Istituto di Analisi dei Sistemi ed Informatica "Antonio Ruberti", Consiglio Nazionale delle Ricerche, Roma, Italy; SYSBIO Centre for Systems Biology, Milano, Italy.
³ Dipartimento di Informatica, Università degli Studi di Milano, Milano, Italy; Istituto di Analisi dei Sistemi ed Informatica "Antonio Ruberti", Consiglio Nazionale delle Ricerche, Roma, Italy; SYSBIO Centre for Systems Biology, Milano, Italy.
⁴ Dipartimento di Statistica e Metodi Quantitativi, Università degli Studi di Milano-Bicocca, Milano, Italy; SYSBIO Centre for Systems Biology, Milano, Italy.

PMID: 24663957
PMCID: PMC3963881
DOI: 10.1371/journal.pone.0091963

cuTauLeaping: a GPU-powered tau-leaping stochastic simulator for massive parallel analyses of biological systems

Marco S Nobile et al. PLoS One. 2014.

. 2014 Mar 24;9(3):e91963.

doi: 10.1371/journal.pone.0091963. eCollection 2014.

Authors

Marco S Nobile¹, Paolo Cazzaniga², Daniela Besozzi³, Dario Pescini⁴, Giancarlo Mauri¹

Affiliations

¹ Dipartimento di Informatica, Sistemistica e Comunicazione, Università degli Studi di Milano-Bicocca, Milano, Italy; SYSBIO Centre for Systems Biology, Milano, Italy.
² Dipartimento di Scienze Umane e Sociali, Università degli Studi di Bergamo, Bergamo, Italy; Istituto di Analisi dei Sistemi ed Informatica "Antonio Ruberti", Consiglio Nazionale delle Ricerche, Roma, Italy; SYSBIO Centre for Systems Biology, Milano, Italy.
³ Dipartimento di Informatica, Università degli Studi di Milano, Milano, Italy; Istituto di Analisi dei Sistemi ed Informatica "Antonio Ruberti", Consiglio Nazionale delle Ricerche, Roma, Italy; SYSBIO Centre for Systems Biology, Milano, Italy.
⁴ Dipartimento di Statistica e Metodi Quantitativi, Università degli Studi di Milano-Bicocca, Milano, Italy; SYSBIO Centre for Systems Biology, Milano, Italy.

PMID: 24663957
PMCID: PMC3963881
DOI: 10.1371/journal.pone.0091963

Abstract

Tau-leaping is a stochastic simulation algorithm that efficiently reconstructs the temporal evolution of biological systems, modeled according to the stochastic formulation of chemical kinetics. The analysis of dynamical properties of these systems in physiological and perturbed conditions usually requires the execution of a large number of simulations, leading to high computational costs. Since each simulation can be executed independently from the others, a massive parallelization of tau-leaping can bring to relevant reductions of the overall running time. The emerging field of General Purpose Graphic Processing Units (GPGPU) provides power-efficient high-performance computing at a relatively low cost. In this work we introduce cuTauLeaping, a stochastic simulator of biological systems that makes use of GPGPU computing to execute multiple parallel tau-leaping simulations, by fully exploiting the Nvidia's Fermi GPU architecture. We show how a considerable computational speedup is achieved on GPU by partitioning the execution of tau-leaping into multiple separated phases, and we describe how to avoid some implementation pitfalls related to the scarcity of memory resources on the GPU streaming multiprocessors. Our results show that cuTauLeaping largely outperforms the CPU-based tau-leaping implementation when the number of parallel simulations increases, with a break-even directly depending on the size of the biological system and on the complexity of its emergent dynamics. In particular, cuTauLeaping is exploited to investigate the probability distribution of bistable states in the Schlögl model, and to carry out a bidimensional parameter sweep analysis to study the oscillatory regimes in the Ras/cAMP/PKA pathway in S. cerevisiae.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. Schematization of CUDA architecture.**
Schematic representation of CUDA threads and memory hierarchy. *Left side*. Thread organization: a single kernel is launched from the host (the CPU) and is executed in multiple threads on the device (the GPU); threads can be organized in three-dimensional structures named blocks which can be, in turn, organized in three-dimensional grids. The dimensions of blocks and grids are explicitly defined by the programmer. *Right side*. Memory hierarchy: threads can access data from many different memories with different scopes; registers and local memories are private for each thread. Shared memory let threads belonging to the same block communicate, and has low access latency. All threads can access the global memory, which suffers of high latencies but is cached since the introduction of Fermi architecture. Texture and constant memories can be read from any thread and feature a cache as well. Figures are taken from the Nvidia's CUDA programming guide .

**Figure 2. cuTauLeaping workflow.**
Simplified scheme of cuTauLeaping workflow: in phase P1 each thread calculates the value for the simulation step; in phase P2, the threads whose is “large” perform a tau-leaping step **(by executing a set of non-critical reactions and (possibly) one critical reaction)**; the remaining threads perform a fixed number of SSA steps **(where one reaction is executed at each step)** during phase P3. The phases are iterated until all threads have reached , a termination criterion verified during phase P4.

formula image — **Figure 2. cuTauLeaping workflow.**
Simplified scheme of cuTauLeaping workflow: in phase P1 each thread calculates the value for the simulation step; in phase P2, the threads whose is “large” perform a tau-leaping step **(by executing a set of non-critical reactions and (possibly) one critical reaction)**; the remaining threads perform a fixed number of SSA steps **(where one reaction is executed at each step)** during phase P3. The phases are iterated until all threads have reached , a termination criterion verified during phase P4.

**Figure 3. Pseudocode of cuTauLeaping – host side.**
Host-side pseudocode of cuTauLeaping. As a first step, the stoichiometric information of the reactions is exploited to pre-calculate the data structures needed by the algorithm; all matrices are flattened during this process. Then, once the support memory areas are allocated (e.g., the chunk of global memory where the system dynamics will be stored), the four phases of cuTauLeaping begin and are repeated until all simulations are completed.

**Figure 4. Pseudocode of cuTauLeaping – kernel .**
Device-side pseudocode of kernel in cuTauLeaping, implementing the subdivision of threads according to the value and the execution of a tau-leaping step. The kernel starts by loading the vectors and – which correspond to the current state of the system and to the values of stochastic constants, respectively – from the global memory areas that contain these data for all threads. Since these information are frequently accessed, they are immediately copied into the faster shared memory as vectors x and c, respectively. The kernel continues by verifying that the value for the running thread is not equal to the signal of terminated execution (i.e., ). Then, it calculates the propensity functions of all reactions and accumulates their values in ; if , the remaining time instants where the dynamics of the system is sampled are set to the current state and the simulation is terminated. The kernel concludes the phase P1 by calculating a putative value for the tau-leaping step: if is smaller than , then thread is halted and is set to 0, so that it will perform the SSA steps during the next phase. Otherwise, the tau-leaping algorithm is performed by executing a set of non-critical reactions and (possibly) one critical reaction and, if the simulation has overrun one of the sampling time instants, the state stored in is determined by linear interpolation.

**Figure 5. Pseudocode of cuTauLeaping – kernel .**
Device-side pseudocode of kernel in cuTauLeaping, implementing the execution of the SSA steps. The kernel starts by loading the vectors and – which correspond to the current state of the system and to the values of stochastic constants, respectively – from the global memory areas that contain these data for all threads. Since these information are frequently accessed, they are immediately copied into the faster shared memory as vectors x and c, respectively. The kernel continues by verifying that the value for the running thread is equal to the signal corresponding to SSA (i.e., ). Then, it performs a fixed number of SSA steps (100 in our default setting), **where a single reaction is executed at each step**, storing the system state at the sampled time instants .

**Figure 6. Pseudocode of cuTauLeaping – kernel .**
Device-side pseudocode of kernel in cuTauLeaping, implementing the verification of the termination of all simulations. The verification is performed by means of CUDA's hardware accelerated synchronization and counting features, which allow to count the threads of a block which satisfy a specific predicate. By exploiting CUDA's atomic functions, we accumulate the total number of threads which satisfy the predicate : if it is equal to the number of threads, the execution of all parallel simulations is completed.

**Figure 7. Schematization of the flattened representation of the stoichiometric information.**
The stoichiometry of chemical reactions is generally represented by (usually sparse) matrices, corresponding to the variation of the species appearing either as reactants or products; however, both tau-leaping and SSA exploit only the non-zero values of these matrices. Each stoichiometric matrix can be pre-processed to identify its non-zero values and discard the remaining ones, thus reducing the number of reading operations required by the two stochastic algorithms. Our strategy to reduce the size of these matrices consists in flattening each matrix as a vector of triples (), where is the row index, is the column index and is the non-zero value in . In our implementation, both and indices are 0-based and triples are stored using vectors of CUDA's data types, that have the advantage of requiring a single instruction to fetch an entry. The top part of this figure shows the values appearing in the 3×4 stoichiometric matrix of reactant species of the Michaelis-Menten model (MM), which consists of 3 reactions over 4 molecular species (see Text S1). Note that only four cells of this matrix have non-zero values; the bottom part of the figure shows the corresponding vector.

**Figure 8. Comparison of the computational time of CPU tau-leaping and cuTauLeaping.**
Comparison between the computational time taken by cuTauLeaping and COPASI CPU tau-leaping to execute different batches of stochastic simulations of the Michaelis-Menten (MM) model (a), the Prokaryotic Gene Network (PGN) model (b), the Schlögl model (c), and the Ras/cAMP/PKA pathway (d) (see Text S1 for models definitions). For each model, cuTauLeaping becomes more profitable than the CPU counterpart when a certain number of parallel simulations is run, with a break-even that depends on the complexity of the system: for the MM and PGN models, cuTauLeaping is more effective when around parallel simulations are run, while for the Schlögl model and the Ras/cAMP/PKA pathway the break-even is around simulations. Considering the speedup, the best results achieved with cuTauLeaping – with respect to COPASI – are around 583× for the MM model, 961× for the PGN model, 90× for the Schlögl model, and 25× for the model of the Ras/cAMP/PKA pathway (see also Table 1).

**Figure 9. Frequency distribution of bistable states in the Schlögl model.**
Frequency distribution of the molecular amount of molecular species in the Schlögl model, calculated using a total of parallel simulations executed by cuTauLeaping. (a) Plot of the frequency distribution of considering a.u., to detect the bistable switching behavior that takes place in the first time instants of the dynamics; a slightly higher probability to reach the low steady state can be observed, starting from the initial state of the Schlögl system (described in Text S1). (b) Plot of the frequency distribution of considering a.u., to investigate the stability of the two steady states of the system; the heatmap highlights the two stable states (around 100 and 600 molecules of species ), and shows larger stochastic fluctuations around the high steady state.

**Figure 10. Parameter sweep analysis of the Schlögl model.**
Results of a PSA-1D on the Schlögl model, in which the value of the stochastic constant is varied in the interval (the set of reactions and the values of all other parameters are given in Tables 4 and 5 in Text S1). Each frequency distribution is calculated according to simulations executed by cuTauLeaping, measuring the amount of the molecular species at the time instant a.u., considering ten different values of the stochastic constant within the sweep interval. The figure shows that increasing values of induce a decrease (increment) in the frequency distribution of concerning the low (high) steady state, with intermediate values of characterized by an effective bistable behavior.

**Figure 11. Three-dimensional parameter sweep analysis of the Schlögl model.**
Results of a PSA-3D on the Schlögl model, performed by varying the stochastic constants , and in the intervals , and , respectively. The values of the stochastic constants were uniformly sampled in a three-dimensional lattice; for each sample, we executed 256 simulations with cuTauLeaping (for a total of simulations) and evaluated the frequency distribution of the amount of the molecular species at the time instant a.u.. This set of values was then partitioned according to the reached (low or high) stable steady state; in the plot, the red (blue) region corresponds to the parameterizations of the model which yield the high (low) steady state most frequently. The green region represents a set of conditions whereby both steady states are equally reached.

**Figure 12. Bidimensional parameter sweep analysis of the Ras/cAMP/PKA model.**
Results of a PSA-2D on the Ras/cAMP/PKA model by varying the amount of GTP in the interval molecules (ranging from a reduced nutrient availability to a normal growth condition), and the amount of Cdc25 in the interval molecules (ranging from the deletion to a 2-fold overexpression of this GEF proteins). The figure shows the amplitude of cAMP oscillations, evaluated as described in ; an amplitude value equal to zero corresponds to a non oscillating dynamics. (a) Plot of the results obtained by running parallel simulations with cuTauLeaping; (b) plot of the results obtained by running sequential simulations, performed on the CPU. The two batches of parallel and sequential simulations were executed with a comparable computational time.

**Figure 13. Performance comparison of CPU tau-leaping and cuTauLeaping for a PSA of the Ras/cAMP/PKA model.**
Running times of cuTauLeaping and COPASI CPU tau-leaping to execute a PSA-1D of the Ras/cAMP/PKA model, where the stochastic constant was varied in the interval and a total of simulations were executed. The plot shows how the computational cost of tau-leaping running on CPU rapidly increases; this behavior can become prohibitive if several independent simulations need to be executed. On the contrary, cuTauLeaping shows a very moderate increase in the running times and outperforms the CPU implementation of tau-leaping.

See this image and copyright information in PMC

Cited by

The feasibility of genome-scale biological network inference using Graphics Processing Units.
Thiagarajan R, Alavi A, Podichetty JT, Bazil JN, Beard DA. Thiagarajan R, et al. Algorithms Mol Biol. 2017 Mar 20;12:8. doi: 10.1186/s13015-017-0100-5. eCollection 2017. Algorithms Mol Biol. 2017. PMID: 28344638 Free PMC article.
Massive exploration of perturbed conditions of the blood coagulation cascade through GPU parallelization.
Cazzaniga P, Nobile MS, Besozzi D, Bellini M, Mauri G. Cazzaniga P, et al. Biomed Res Int. 2014;2014:863298. doi: 10.1155/2014/863298. Epub 2014 Jun 16. Biomed Res Int. 2014. PMID: 25025072 Free PMC article.
Graphics processing units in bioinformatics, computational biology and systems biology.
Nobile MS, Cazzaniga P, Tangherloni A, Besozzi D. Nobile MS, et al. Brief Bioinform. 2017 Sep 1;18(5):870-885. doi: 10.1093/bib/bbw058. Brief Bioinform. 2017. PMID: 27402792 Free PMC article. Review.
LASSIE: simulating large-scale models of biochemical systems on GPUs.
Tangherloni A, Nobile MS, Besozzi D, Mauri G, Cazzaniga P. Tangherloni A, et al. BMC Bioinformatics. 2017 May 10;18(1):246. doi: 10.1186/s12859-017-1666-0. BMC Bioinformatics. 2017. PMID: 28486952 Free PMC article.
Abstraction-based segmental simulation of reaction networks using adaptive memoization.
Helfrich M, Andriushchenko R, Češka M, Křetínský J, Martiček Š, Šafránek D. Helfrich M, et al. BMC Bioinformatics. 2024 Nov 8;25(1):350. doi: 10.1186/s12859-024-05966-5. BMC Bioinformatics. 2024. PMID: 39516723 Free PMC article.

See all "Cited by" articles

References

1. Aldridge B, Burke J, Lauffenburger D, Sorger P (2006) Physicochemical modelling of cell signalling pathways. Nat Cell Biol 8: 1195–203. - PubMed
1. Hyduke D, Palsson B (2010) Towards genome-scale signalling network reconstructions. Nat Rev Genet 11: 297–307. - PubMed
1. Kitano H (2002) Computational systems biology. Nature 420: 206–210. - PubMed
1. Papin J, Hunter T, Palsson B, Subramaniam S (2005) Reconstruction of cellular signalling networks and analysis of their properties. Nat Rev Mol Cell Biol 9: 99–111. - PubMed
1. Chou I, Voit E (2009) Recent developments in parameter estimation and structure identification of biochemical and genomic systems. Math Biosci 219: 57–83. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- Saccharomyces Genome Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

cuTauLeaping: a GPU-powered tau-leaping stochastic simulator for massive parallel analyses of biological systems

Affiliations

cuTauLeaping: a GPU-powered tau-leaping stochastic simulator for massive parallel analyses of biological systems

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases