Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 8;23(1):269.
doi: 10.1186/s12859-022-04779-8.

J-SPACE: a Julia package for the simulation of spatial models of cancer evolution and of sequencing experiments

Affiliations

J-SPACE: a Julia package for the simulation of spatial models of cancer evolution and of sequencing experiments

Fabrizio Angaroni et al. BMC Bioinformatics. .

Abstract

Background: The combined effects of biological variability and measurement-related errors on cancer sequencing data remain largely unexplored. However, the spatio-temporal simulation of multi-cellular systems provides a powerful instrument to address this issue. In particular, efficient algorithmic frameworks are needed to overcome the harsh trade-off between scalability and expressivity, so to allow one to simulate both realistic cancer evolution scenarios and the related sequencing experiments, which can then be used to benchmark downstream bioinformatics methods.

Result: We introduce a Julia package for SPAtial Cancer Evolution (J-SPACE), which allows one to model and simulate a broad set of experimental scenarios, phenomenological rules and sequencing settings.Specifically, J-SPACE simulates the spatial dynamics of cells as a continuous-time multi-type birth-death stochastic process on a arbitrary graph, employing different rules of interaction and an optimised Gillespie algorithm. The evolutionary dynamics of genomic alterations (single-nucleotide variants and indels) is simulated either under the Infinite Sites Assumption or several different substitution models, including one based on mutational signatures. After mimicking the spatial sampling of tumour cells, J-SPACE returns the related phylogenetic model, and allows one to generate synthetic reads from several Next-Generation Sequencing (NGS) platforms, via the ART read simulator. The results are finally returned in standard FASTA, FASTQ, SAM, ALN and Newick file formats.

Conclusion: J-SPACE is designed to efficiently simulate the heterogeneous behaviour of a large number of cancer cells and produces a rich set of outputs. Our framework is useful to investigate the emergent spatial dynamics of cancer subpopulations, as well as to assess the impact of incomplete sampling and of experiment-specific errors. Importantly, the output of J-SPACE is designed to allow the performance assessment of downstream bioinformatics pipelines processing NGS data. J-SPACE is freely available at: https://github.com/BIMIB-DISCo/J-Space.jl .

Keywords: Cancer Evolution; Next-generation sequencing; Spatial dynamics; Stochastic Simulation.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The J-SPACE framework. Schematic representation of J-SPACE. A First, the algorithm simulates the spatial growth of the cells over an arbitrary graph. Then, J-SPACE simulates a spatial sampling (black circle) at a given time point. B J-SPACE reconstructs the phylogeny of the sampled cells (i.e., the leaves of the tree) and, given an ancestral genome, it generates the ground-truth sequence of the sampled cells using various substitution models. C A NGS experiment is simulated to return synthetic reads as outputs
Fig. 2
Fig. 2
Phantom events and the reconstruction of phylogenetic trees. A Pictorial representation of the possible phantom events in a simulation with two different subpopulations. B Simplified scheme of the algorithm that generates the ground-truth phylogenetic tree from the list of birth events. First, the algorithm prunes the branches the leaves of which are not sampled (in red), then it removes the remaining edges that are not coalescent events
Fig. 3
Fig. 3
Performances assessment. A The distribution of computational time in seconds to perform the simulation described in the text with respect to distinct sample size (over 50 simulation per configuration). B The distribution of computational time in seconds to generate the phylogenetic tree with respect to different sample size (over 50 simulation per configuration). C Distribution of computational time in seconds to generate the sequences for the phylogenetic trees above, with respect to distinct sample size (left) and genome length (right). In the top row, we present the results of the ISA-based model, in the bottom row we show the results of a finite-sites model (JC69) with indels (see the main text for further details)
Fig. 4
Fig. 4
Analysis of cancer spatial dynamics and phylogenetic models. A The dynamics of the probability distribution of the number of cells is presented, divided by lattice dimensionality (2D or 3D). The dotted lines represent the expected values.B Box plots representing the distribution of the inferred steepness values of logistic growth are presented. CD The distribution of the of the Sackin index and Beta-splitting statistic, evaluated on the trees divided by interaction rules and lattice dimensionality
Fig. 5
Fig. 5
Variant calling with different mutational signatures. A An example dynamics of the number of cells for each subpopulation generated during the simulation. The bottom part of the panel presents the input driver mutational tree with the birth rate for each subpopulation. B At the top we present the phylogenetic tree generated by sampling 100 cells. We proceeded by simulating three different substitution models generated combinations of signatures SBS6 and SBS22 from the COSMIC database [71]. The difference between the three models consists in the time dynamics of the activation functions presented in this figure. C The count of the number of unique mutations simulated divided per class of substitution. The plot presents the result for the three different models. D The count of the number of unique mutations divided per class of substitution detected using the pipeline described in the main text

References

    1. Nowell PC. The clonal evolution of tumor cell populations. Science. 1976;194(4260):23–28. doi: 10.1126/science.959840. - DOI - PubMed
    1. Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011;144(5):646–674. doi: 10.1016/j.cell.2011.02.013. - DOI - PubMed
    1. Sottoriva A, Spiteri I, Piccirillo SG, Touloumis A, Collins VP, Marioni JC, Curtis C, Watts C, Tavaré S. Intratumor heterogeneity in human glioblastoma reflects cancer evolutionary dynamics. Proc Nat Acad Sci. 2013;110(10):4009–4014. doi: 10.1073/pnas.1219747110. - DOI - PMC - PubMed
    1. Caravagna G, Graudenzi A, Ramazzotti D, Sanz-Pamplona R, De Sano L, Mauri G, Moreno V, Antoniotti M, Mishra B. Algorithmic methods to infer the evolutionary trajectories in cancer progression. Proc Nat Acad Sci. 2016;113(28):4025–4034. doi: 10.1073/pnas.1520213113. - DOI - PMC - PubMed
    1. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome biol. 2009;10(3):1–10. doi: 10.1186/gb-2009-10-3-r25. - DOI - PMC - PubMed