Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan;57(1):103-114.
doi: 10.1038/s41588-024-01989-z. Epub 2024 Nov 29.

Characterizing the evolutionary dynamics of cancer proliferation in single-cell clones with SPRINTER

Affiliations

Characterizing the evolutionary dynamics of cancer proliferation in single-cell clones with SPRINTER

Olivia Lucas et al. Nat Genet. 2025 Jan.

Abstract

Proliferation is a key hallmark of cancer, but whether it differs between evolutionarily distinct clones co-existing within a tumor is unknown. We introduce the Single-cell Proliferation Rate Inference in Non-homogeneous Tumors through Evolutionary Routes (SPRINTER) algorithm that uses single-cell whole-genome DNA sequencing data to enable accurate identification and clone assignment of S- and G2-phase cells, as assessed by generating accurate ground truth data. Applied to a newly generated longitudinal, primary-metastasis-matched dataset of 14,994 non-small cell lung cancer cells, SPRINTER revealed widespread clone proliferation heterogeneity, orthogonally supported by Ki-67 staining, nuclei imaging and clinical imaging. We further demonstrated that high-proliferation clones have increased metastatic seeding potential, increased circulating tumor DNA shedding and clone-specific altered replication timing in proliferation- or metastasis-related genes associated with expression changes. Applied to previously generated datasets of 61,914 breast and ovarian cancer cells, SPRINTER revealed increased single-cell rates of different genomic variants and enrichment of proliferation-related gene amplifications in high-proliferation clones.

PubMed Disclaimer

Conflict of interest statement

Competing interests: A.M.F. is a co-inventor on a patent application to determine methods and systems for tumor monitoring (PCT/EP2022/077987). D.A.M. reports speaker fees from AstraZeneca and Takeda; consultancy fees from AstraZeneca, Thermo Fisher, Takeda, Amgen, Janssen, MIM Software, Bristol Myers Squibb and Eli Lilly and has received educational support from Takeda and Amgen. S.A. is a founder and shareholder of GenomeTherapeutics and scientific advisor to Sangamo Therapeutics, the Institute of Cancer Research, London, and the New York Genome Center, NY. N.M. has stock options in and has consulted for Achilles Therapeutics; holds a European patent in determining HLA LOH (PCT/GB2018/052004) and is a co-inventor to a patent to identifying responders to cancer treatment (PCT/GB2018/051912). M.J.-H. has received funding from CRUK, the National Institutes of Health (NIH) National Cancer Institute, International Association for the Study of Lung Cancer (IASLC) Foundation, Lung Cancer Research Foundation, Rosetrees Trust, UKI NETs and NIHR; has consulted for Astex Pharmaceutical and Achilles Therapeutics; is a member of the Achilles Therapeutics Scientific Advisory Board and Steering Committee and has received speaker honoraria from Pfizer, Astex Pharmaceuticals, Oslo Cancer Cluster, Bristol Myers Squibb and Genentech. M.J.-H. is listed as a co-inventor on a European patent application relating to methods to detect lung cancer (PCT/US2017/028013); this patent has been licensed to commercial entities, and, under terms of employment, M.J.-H. is due a share of any revenue generated from such license(s) and is also listed as a co-inventor on the GB priority patent application (GB2400424.4) with title—Treatment and Prevention of Lung Cancer. C.S. acknowledges grants from AstraZeneca, Boehringer-Ingelheim, Bristol Myers Squibb, Pfizer, Roche-Ventana, Invitae (previously Archer Dx—collaboration in minimal residual disease sequencing technologies), Ono Pharmaceutical and Personalis. He is the chief investigator for the AZ MeRmaiD 1 and 2 clinical trials and is the Steering Committee Chair. He is also the co-chief investigator of the NHS Galleri trial funded by GRAIL and a paid member of GRAIL’s Scientific Advisory Board (SAB). He receives consultant fees from Achilles Therapeutics (also a SAB member), Bicycle Therapeutics (also a SAB member), Genentech, Medicxi, the China Innovation Centre of Roche (CICoR), formerly Roche Innovation Centre—Shanghai, Metabomed (until July 2022), Relay Therapeutics SAB member, Saga Diagnostics SAB member and the Sarah Cannon Research Institute. C.S. has received honoraria from Amgen, AstraZeneca, Bristol Myers Squibb, GlaxoSmithKline, Illumina, MSD, Novartis, Pfizer and Roche-Ventana. C.S. has previously held stock options in Apogen Biotechnologies and GRAIL; currently has stock options in Epic Bioscience, Bicycle Therapeutics and Relay Therapeutics; and has stock options and is cofounder of Achilles Therapeutics. C.S. declares a patent application for methods to lung cancer (PCT/US2017/028013), targeting neoantigens (PCT/EP2016/059401), identifying patent response to immune checkpoint blockade (PCT/EP2016/071471), methods for lung cancer detection (US20190106751A1), identifying patients who respond to cancer treatment (PCT/GB2018/051912), determining HLA LOH (PCT/GB2018/052004), predicting survival rates of patients with cancer (PCT/GB2020/050221), methods and systems for tumor monitoring (PCT/EP2022/077987). C.S. is an inventor on a European patent application (PCT/GB2017/053289) relating to assay technology to detect tumor recurrence. This patent has been licensed to a commercial entity, and, under their terms of employment, C.S. is due a revenue share of any revenue generated from such license(s). The other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. The SPRINTER algorithm.
There are six main steps in SPRINTER. (1) The first step calculates the RDR and replication timing (early and late in magenta and green, respectively) of each genomic bin. (2) The second step infers segments of neighboring bins likely to be affected by the same CNAs by identifying candidate breakpoints independently in early or late bins and preserving only those breakpoints supported by both (dashed red lines preserved versus dashed gray lines discarded). (3) The third step identifies S-phase cells by performing a statistical permutation test of replication timing on RDRs normalized per segment (to remove the effect of CNAs) to assess the presence of significant differences between early (higher values) and late (lower values) bins expected for S-phase cells (bottom row) in contrast to G0/G1/G2-phase cells (top row). (4) The fourth step infers clones by identifying cell-specific CNAs (black lines) for all G0/G1/G2-phase cells and grouping cells with the same complement of CNAs (colored bars). (5) The fifth step assigns each S-phase cell to the maximum-a-posteriori clone (green check mark)—RDRs are corrected for replication fluctuations, and clone assignment is chosen to maximize the posterior probability across all possible assignments (best fit of black lines). (6) The sixth step identifies G2-phase cells per clone by deconvolving the distribution of total read counts yielded by either G0/G1-phase (light gray with lower values) or G2-phase (black with higher values) cells. SPRINTER’s results—each cell (row) with inferred CNAs (colors) across bins (columns) is assigned to a clone, providing estimates of S (left dark gray bars) and G2 (black bars) fractions. The figure is created with BioRender.com.
Fig. 2
Fig. 2. SPRINTER improves S-phase identification and enables accurate clone assignment of S-phase cells.
a, The proportion of correctly identified G1/G2- and S-phase cells (y axis) was computed for CCC (blue), MAPD (orange), rtMAPD (MAPD extended with replication timing, red) and SPRINTER (green) across cell cycle phases (x axis) for 100 cell subpopulations (dots), each formed by sampling 500 cells from the diploid (left) or tetraploid (right) ground truth datasets. b, ROC curves (false-positive rates versus true-positive rates) measure the performance in distinguishing G1-phase cells from actively replicating cells using the classification scores computed by existing methods (blue, orange and red) or combining SPRINTER’s S- and G2-phase P values (using the minimum, green) by bootstrapping 300 diploid (top) or tetraploid (bottom) cells for 100 repeats (each curve). c, A binomial process was used to generate cell subpopulation pairs with the same (top) or different (bottom) true underlying fractions of replicating cells (that is, proliferation). The figure is created with BioRender.com. d, The proliferation accuracy was computed for all methods (colors) considering 600 pairs of clones generated as described in c by sampling varying numbers of diploid (left) and tetraploid (right) cells per clone (x axis) with varying S and G2 fractions (20–30% ± 30–50%) for 50 repeats (dots). e, Top, RDRs across 50 kb bins (x axis) for an S-phase cell are affected by replication-induced fluctuations (early- and late-replicating bins in magenta and green, respectively) preventing accurate CNA identification (scattered black lines for expected CNAs). Bottom, instead, SPRINTER’s replication-corrected RDRs are similar to CNA expectations (black lines). f, The absolute error rate (x axis) between true and expected fractions of S-phase cells assigned to a clone was calculated per cell using all methods (colors) in 30 populations of 300 tetraploid cells each, altogether comprising 389 clones. The proportion of clones for which the assigned true S fraction was compatible with the expected S fraction was computed using a binomial test (pie charts). In d and f, box plots show the median and IQR with whiskers denoting values within 1.5 times the IQR from the first and third quartiles. AUC, area under the curve; ROC, receiver operating characteristic; IQR, interquartile range.
Fig. 3
Fig. 3. SPRINTER identifies tumor clone proliferation heterogeneity in patient CRUKP9145 with NSCLC.
a, The distributions of SPRINTER’s inferred S fractions (bottom, y axis) for each NSCLC clone (x axis) with varying cell numbers (top, y axis) in primary (top) and metastatic (bottom) samples were calculated by bootstrapping (300 repeats; dashed lines represent sample-level averages). Clone S fractions were compared per sample using a two-sided chi-square test, combined using the minimum and a Benjamini–Hochberg correction was applied (family-wise error rate = 0.1; red asterisks indicate significant P values). Sample-level S fraction 95% CIs (between axes) were computed by bootstrapping cells per sample. *P < 0.1, **P< 0.05 and ***P< 0.005. b, Ki-67 staining from one representative slide in primary and metastatic samples, indicating areas with high and low Ki-67 (boxes) that were consistent with SPRINTER clone S fractions (red asterisk). c, Top, nuclear diameter (x axis, micrometers, normalized by sample mean) was measured by DLP+ nozzle-based imaging for 14,569 cells with successfully recorded images inferred to be in G1, S or G2 phase by SPRINTER (y axis), with each pair of distributions compared using a one-sided Mann–Whitney U test (P values on right). Bottom, the nuclear diameter per clone (x axis) was calculated using the minimum diameter across the cells in each clone (each dot) that were assigned to different cell cycle phases by SPRINTER (y axis). Across cell cycle phases, clones are linked by lines, such that the line width is proportional to clone size and the line color indicates whether the nuclear diameter per clone has increased as expected (red) or not (blue). Nuclear diameters in different cell cycle phases were compared per clone using a one-sided Wilcoxon signed-rank test (P values on right). Right, example microscopy images of nuclei in each phase. d, For five primary tumor samples in this study (colored circles on photo) and three additional samples (gray circles), each bulk clone identified in previous analysis (hexagons comprising clones with different inner shapes of size proportional to cell proportion) was assigned to the most similar SPRINTER clone using SNVs (colors, with legend marker size proportional to SPRINTER’s inferred S fraction). In a and c, box plots show the median and the IQR with whiskers denoting values within 1.5 times the IQR from the first and third quartiles. CI, confidence interval.
Fig. 4
Fig. 4. SPRINTER reveals a link between clone proliferation and metastatic seeding, and clone-specific ART present in distinct metastatic clades.
a, Tumor phylogeny was reconstructed for SPRINTER’s single-cell clones (tree leaves) from patient CRUKP9145 (colored by sample, with clones uniquely shaded). Seeding clones (dark gray) and ancestral clones (white with border colored according to inferred anatomical site) were inferred, with some clones harboring ctDNA-tracked SNVs (Roman numerals). b, Phylogeny from a with clones colored by SPRINTER’s S fractions. c, Across samples (anatomical location indicated as circles on body map), metastatic migrations (arrows) were inferred, and metastatic clades (blue, green and pink with corresponding clones indicated in tree) were defined based on primary tumor seeding clones. The figure is created with BioRender.com. d, In the two main phylogenetic branches containing different metastatic clades (top row), SPRINTER inferred ART (colored rectangles) for each clone (second row) for genes (left) known to impact proliferation or metastatic potential, with reference replication timing derived from normal cells shown (left column). ART is supported by related gene expression changes measured using bulk RNA sequencing (right heatmap), with late-to-early and early-to-late ART associated with increased and decreased gene expression, respectively (P values derived using a two-sided Wald test with a Benjamini–Hochberg correction with family-wise error rate = 0.05). *P < 0.1, **P< 0.05 and ***P < 0.01. e, For each SPRINTER clone (dot) in the primary tumor (dark blue) or metastases (orange), the seeding genetic distance (x axis) computed with respect to the closest seeding clone based on either SNVs (left) or CNAs (right) was compared to SPRINTER’s S fraction (y axis) using two-sided Pearson correlation tests (correlation coefficients and P values reported), and the 95% CI was calculated for linear regressions (shaded areas). f, For each ctDNA-tracked clone (dot), a ctDNA shedding index (x axis) was calculated using the frequency of SNVs for either (left) SPRINTER single-cell clones or (right) previous bulk clones and compared to the maximum S fraction inferred from descendant SPRINTER clones (y axis). In each case, a two-sided Spearman correlation test was performed (with correlation coefficients and P values reported), and the 95% CI was calculated for linear regressions (shaded areas).
Fig. 5
Fig. 5. SPRINTER reveals increased single-cell rates of clone-specific genomic variants and enrichment for specific oncogene amplifications in TNBC and HGSC high-proliferation clones.
a, In 7 TNBC and 15 HGSC tumors (dark blue and dark pink in the first row with distinct tumors colored differently in the second row), the distribution of the S fraction (bottom, y axis) of each SPRINTER clone (x axis) with varying cell numbers (top, y axis in log10 scale) was calculated by bootstrapping (with 300 repeats) using the S-phase cells identified and assigned to clones by SPRINTER. bd, Single-cell rates of clone-specific genomic variants were measured in individual cells (y axis, for 23,383 TNBC and 10,235 HGSC cells, excluding cells classified as outliers, tumors with single clones and cells without measured variants) for SNVs (b), SVs (c) and CNAs (d) in high- and low-proliferation clones (separated by the median of inferred S fractions, x axis) in the TNBC (left) and HGSC (right) datasets, with P values as measured by a one-sided Mann–Whitney U test and Cohen’s d effect sizes shown. e, For each known oncogene (dots, obtained from the COSMIC Cancer Gene Census excluding tumor suppressor genes), a one-sided Mann–Whitney U test was used to identify amplifications present in clones with significantly higher S fractions than other clones, with P values multiple hypothesis-corrected using the Benjamini–Hochberg method with family-wise error rate = 0.05 (y axis, negative log scale) and the related differences between the average S fractions (x axis) shown for each test. Genes passing the test (red, with the minimum corrected threshold indicated with the dotted line) are enriched in clones with increased proliferation, with genes relevant to cancer proliferation annotated. f, Cancer-relevant pathways (y axis) enriched for genes with amplifications significantly associated with high clone proliferation from e were identified using a gene set enrichment analysis (combined scores on x axis). In a–d, box plots show the median and the IQR with whiskers denoting values within 1.5 times the IQR from the first and third quartiles, respectively.
Extended Data Fig. 1
Extended Data Fig. 1. S-phase cells display a clear difference in read depth ratios (RDRs) between early and late genomic regions in contrast to G1/G2-phase cells.
Average RDRs (y axis) were measured by SPRINTER in 50 kb genomic bins with early (magenta) or late (green) replication timing across autosomes in the genome (x axis) in either the diploid (a and b) or tetraploid (c and d) ground truth datasets and across either G1/G2- (a and c) or mid-S-phase (b and d) cells (500 cells in each group).
Extended Data Fig. 2
Extended Data Fig. 2. Early and late genomic regions are distributed across the genome and within chromosomes, displaying clear differences in read depth ratios (RDRs) between early and late genomic regions in S-phase cells in contrast to G1/G2-phase cells.
RDRs (y axis) were measured by SPRINTER in 50 kb genomic bins with early (magenta) or late (green) replication timing across autosomes in the genome (x axis in top) or in example chromosomes (bottom) for different examples of individual cells that belong to either the diploid (a and b) or tetraploid (c and d) ground truth datasets and are either in the G1/G2 (a and c) or S (b and d) phase of the cell cycle.
Extended Data Fig. 3
Extended Data Fig. 3. Cells at different stages of S phase display different replication-induced fluctuations of RDR.
Average RDRs (y axis) were measured by SPRINTER in 50 kb genomic bins with either early (magenta) or late (green) replication timing across autosomes in the genome (x axis) for (a) 180 early-S-phase cells, (b) 916 mid-S-phase cells and (c) 901 late-S-phase cells in the generated tetraploid ground truth dataset that were identified as S phase by SPRINTER. As expected, cells at different stages of S phase exhibit clearly different replication fluctuations in RDRs: in early-S phase only early-replicating bins shift to higher values of RDR, in mid-S phase all the early bins have completed replication and have distinctly higher values of RDR than late bins, and in late-S phase, late bins also start replicating and some of these bins increase their values of RDR.
Extended Data Fig. 4
Extended Data Fig. 4. SPRINTER’s replication-aware framework enables the differentiation of RDR fluctuations due to either replication or CNAs.
a, Average RDRs (y axis) were measured by SPRINTER in 50 kb genomic bins with either early (magenta) or late (green) replication timing across autosomes in the genome (x axis) for 73 mid-S-phase cells in the generated tetraploid ground truth dataset assigned to the same clone by SPRINTER. b, A replication timing profile (RTP, y axis) is calculated by SPRINTER for each bin (x axis) for the same cells by correcting RDRs for CNAs based on the copy-number segments inferred by SPRINTER, preserving clear fluctuations between bins with different replication timing (with magenta early regions having higher RDRs than green late regions on average). c, Replication-corrected RDRs (y axis) are computed by SPRINTER for each bin (x axis) for the same cells by correcting RDRs for replication fluctuations, such that the remaining fluctuations are likely due to CNAs and are not influenced by replication (in each segment there is no clear difference between bins with different replication timing).
Extended Data Fig. 5
Extended Data Fig. 5. SPRINTER’s results for cells sequenced from five primary tumor samples and five metastases from patient CRUKP9145 with NSCLC.
Baseline copy numbers (heatmap colors) were inferred by SPRINTER on 7312 cancer cells assigned to clones by SPRINTER (rows, excluding normal cells and cells classified as outliers) sequenced from 10 distinct tumor samples (left bar), including (a) 4265 cells from five primary tumor samples obtained at surgery and (b) 3047 cells from five metastases sampled at autopsy, across ~1 Mb genomic bins (columns) with SPRINTER-inferred clones (middle bar) and with S- and G2-phase cells assigned to each corresponding clone (light gray for G1 phase, dark gray for S phase and black for G2 phase in right bar). The anatomical locations of the samples (colored circles) for (a) primary tumor regions and (b) metastases are displayed in corresponding body maps. The figure is created with BioRender.com.
Extended Data Fig. 6
Extended Data Fig. 6. Analysis of growth rates of metastases measured using serial clinical imaging for patient CRUKP9145.
Individual metastases were identified on computed tomography (CT) and magnetic resonance (MR) imaging scans performed during routine clinical management and collected as part of TRACERx. a, The volume of each metastasis (y axis, circle) was measured on serial scans (vertical dashed black lines) allowing changes in volume to be tracked over time (x axis). b, For each interval between two consecutive time points, the growth rate (log(mm3/day)) was calculated for each metastasis using either CT scans for the extra-cranial metastases (solid lines) or MR imaging scans for the brain metastases (dashed lines). For the right adrenal metastasis, which was only detected on the final CT scan (day 139 after surgery), the growth rate was calculated by assigning it a volume below the limit of CT detection on the preceding CT scan (day 59 after surgery, unfilled circle). c, Axial CT images of the left adrenal metastasis (red arrow, days 50 and 139 after surgery) and MR images of the left frontal lobe metastasis (red arrow, days 70 and 112 after surgery) are displayed.
Extended Data Fig. 7
Extended Data Fig. 7. Clone-specific ART in the NSCLC dataset affects <10% of the genome on average as expected from previous studies.
The fraction of clones affected by ART was calculated by combining the fractions of clones affected across all samples (y axis) based on SPRINTER’s clone-specific results in the NSCLC dataset for either late-to-early (positive values, dark magenta) or early-to-late (negative values, dark green) ART in 50 kb genomic bins along the genome (x axis, with autosomes separated by dashed lines). ART was inferred only in high-confidence cases (that is, only ART events that were present in most clones in >2 samples). Known cancer oncogenes in late-to-early genomic regions and known cancer tumor suppressor genes in early-to-late regions (from the COSMIC Cancer Gene Census) are annotated (black text and lines), also including tumor- and metastatic-clade-specific ART events affecting genes in the expression analysis (for example, PDL1, CDK12, NCOA2 and KRAS).
Extended Data Fig. 8
Extended Data Fig. 8. SPRINTER enables the identification of clone-specific ART supported by underlying read counts.
SPRINTER identifies different ART events affecting different genes (annotated text) and present in distinct clones (ae) that belong to different phylogenetic branches (left and right indicated by lilac and light blue rectangles) or different metastatic clades (colored triangles). SPRINTER identifies clone-specific late-to-early (dark magenta) and early-to-late (dark green) ART events in genomic regions across chromosomes (x axis) if they have calculated values of the replication timing profile per clone (clone-specific RTP, y axis) that are higher or lower, respectively, than expected.
Extended Data Fig. 9
Extended Data Fig. 9. SPRINTER estimates clone-specific S and G2 fractions in previous TNBC and HGSC datasets.
For the TNBC and HGSC datasets (first row) with previously annotated genomic signatures (second row, with three signatures defined in the previous analysis of these datasets, that is, HRD, FBI and TD) and for each tumor in these datasets (third row), the distributions of the (middle) S fraction and (bottom) the fraction of actively replicating cells (S + G2 fraction, y axis) for SPRINTER’s inferred clones (x axis) were calculated by bootstrapping (per sample with 300 repeats) using the S- and G2-phase cells identified and assigned to clones by SPRINTER. Box plots show the median and the IQR, and the whiskers denote the lowest and highest values within 1.5 times the IQR from the first and third quartiles, respectively. HRD, homologous recombination deficiency; FBI, fold-back inversions; TD, tandem duplications.
Extended Data Fig. 10
Extended Data Fig. 10. The G2/S ratio is significantly higher in breast cancer clones with HRD.
The G2/S ratio (x axis) was calculated based on the G2 and S fractions inferred by SPRINTER in the clones (dots) with or without HRD (y axis) in the TNBC (left) and HGSC (right) datasets, with P values as measured by a two-sided Mann–Whitney U test when considering (a) all 280 clones inferred by SPRINTER, (b) only the 137 clones with more than 80 cells and (c) only the 58 clones with more than 200 cells. In all panels, box plots show the median and the IQR, and the whiskers denote the lowest and highest values within 1.5 times the IQR from the first and third quartiles, respectively.

References

    1. Hanahan, D. Hallmarks of cancer: new dimensions. Cancer Discov.12, 31–46 (2022). - PubMed
    1. Van Diest, P. J., van der Wall, E. & Baak, J. P. Prognostic value of proliferation in invasive breast cancer: a review. J. Clin. Pathol.57, 675–681 (2004). - PMC - PubMed
    1. Feitelson, M. A. et al. Sustained proliferation in cancer: mechanisms and novel therapeutic targets. Semin. Cancer Biol.35, S25–S54 (2015). - PMC - PubMed
    1. Beresford, M. J., Wilson, G. D. & Makris, A. Measuring proliferation in breast cancer: practicalities and applications. Breast Cancer Res.8, 216 (2006). - PMC - PubMed
    1. Brown, D. C. & Gatter, K. C. Ki67 protein: the immaculate deception? Histopathology40, 2–11 (2002). - PubMed

LinkOut - more resources