Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 2;39(9):btad561.
doi: 10.1093/bioinformatics/btad561.

cloneRate: fast estimation of single-cell clonal dynamics using coalescent theory

Affiliations

cloneRate: fast estimation of single-cell clonal dynamics using coalescent theory

Brian Johnson et al. Bioinformatics. .

Abstract

Motivation: While evolutionary approaches to medicine show promise, measuring evolution itself is difficult due to experimental constraints and the dynamic nature of body systems. In cancer evolution, continuous observation of clonal architecture is impossible, and longitudinal samples from multiple timepoints are rare. Increasingly available DNA sequencing datasets at single-cell resolution enable the reconstruction of past evolution using mutational history, allowing for a better understanding of dynamics prior to detectable disease. There is an unmet need for an accurate, fast, and easy-to-use method to quantify clone growth dynamics from these datasets.

Results: We derived methods based on coalescent theory for estimating the net growth rate of clones using either reconstructed phylogenies or the number of shared mutations. We applied and validated our analytical methods for estimating the net growth rate of clones, eliminating the need for complex simulations used in previous methods. When applied to hematopoietic data, we show that our estimates may have broad applications to improve mechanistic understanding and prognostic ability. Compared to clones with a single or unknown driver mutation, clones with multiple drivers have significantly increased growth rates (median 0.94 versus 0.25 per year; P = 1.6×10-6). Further, stratifying patients with a myeloproliferative neoplasm (MPN) by the growth rate of their fittest clone shows that higher growth rates are associated with shorter time to MPN diagnosis (median 13.9 versus 26.4 months; P = 0.0026).

Availability and implementation: We developed a publicly available R package, cloneRate, to implement our methods (Package website: https://bdj34.github.io/cloneRate/). Source code: https://github.com/bdj34/cloneRate/.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Model schematic and coalescent results. (A) Stem cells undergo symmetric division at rate λ, increasing the population of stem cells by 1. Asymmetric division does not affect the population size or phylogeny except to introduce mutations. Cell death (or differentiation) occurs at rate μ, which removes the cell’s inherited history from the phylogeny and decreases the population size by 1. Our methods seek only to estimate the growth rate during the expansion phase of a clone, when the rate of symmetric division is greater than the rate of cell death (r=λμ>0) and both rates are assumed to be constant during this phase. Mutations, which can occur at or between divisions, are assumed to accumulate linearly with time at rate ν. (B) The approximate distribution of coalescence times for n =50 cells is plotted above one example of n1=49 coalescence times drawn from the exact distribution of coalescence times for a birth–death process. The expected population size assuming logistic growth with different carrying capacities shows that most coalescence events occur at smaller population sizes, when the growth trajectory is still approximately exponential. Other parameters: r =1, (λ=1.5,μ=0.5), T =20. Note that a sampling time T <10 years would artificially affect the distribution of coalescence times, introducing bias. (C) Method overview: Reconstruction of a genealogical tree using the coalescent point process (CPP) can be done by first adding a vertical line of length T, and then adding successive vertical lines representing the coalescence times (Hi). The coalescence times are drawn i.i.d. from the distribution defined in Supplementary Section S1.1 and are then connected via horizontal lines to form the ultrametric tree. (D) Tree reconstructed by randomly merging lineages with coalescence times from (B), which is statistically equivalent to using the coalescent point process.
Figure 2.
Figure 2.
Performance and number of samples, n. (A) Distribution of estimates from our methods using Equation (2) (Max. Likelihood) and Equation (4) (Lengths), Phylofit, and the birth–death MCMC on 500 simulated ultrametric trees for each n value, where n is the number of sampled cells. Simulated trees were generated assuming a continuous time birth–death branching process with r =0.5 per unit time and T =40 time units, where time units are arbitrary (e.g. years). Birth rate λ was sampled from a uniform distribution on [0.5,1.5] and death rate μ=λr. (B) Root mean square error for each method from simulated data shown in (A) illustrates improved performance with number of samples. MCMC methods are most accurate for small n, while the birth–death MCMC and maximum likelihood perform best for large n. (C) Coverage of 95% confidence intervals methods based on simulations in (A), measured as the fraction of simulations where the true growth rate falls within the estimated confidence intervals. (D) Runtime (mean +/− st. dev.) of various methods of estimating net growth rate shows that while the MCMC-based methods scale with the number of samples, n, our methods run effectively instantaneously for any tree size.
Figure 3.
Figure 3.
Performance across different growth rates, r. (A) Distribution of estimates from our methods, Phylofit, and the birth–death MCMC on 500 simulated ultrametric trees for each r value, where r is the growth rate. Simulated trees were generated assuming a continuous time birth–death branching process with n =100 and T =40. Birth rate λ was sampled from a uniform distribution on [r,1+r] and death rate μ=λr. Time units are arbitrary (e.g. years). (B) Root mean square error, normalized to account for the different true growth rates, for each method calculated for simulated data shown in (A) illustrates a decrease in accuracy for small growth rates. Performance of MCMC methods is less affected by small growth rates. (C) Accuracy of 95% confidence intervals of the methods using data from simulations in (A), measured as the fraction of simulations where the true growth rate falls within the estimated confidence intervals.
Figure 4.
Figure 4.
Small growth rate r diagnostic. (A) Trees with n =50 tips and T =40 were created from 10000 randomly sampled r values from a uniform distribution on (0.1, 1) and then binned by the ratio of external to internal lengths in increments of one. For corresponding estimates of r, accuracy of the 95% confidence intervals (CI) shows that the ratio of external to internal lengths can be used as a diagnostic to determine whether our method can accurately estimate growth rate. We use a ratio of 3 as a cutoff value as simulations with a ratio greater than 3 are captured by the 95% confidence intervals approximately 95% of the time. Minimum number of simulations captured in a single bin is 61. Histogram of external to internal lengths ratio for real blood clone data (Table 1) shown in grey for comparison, with frequency denoted by y-axis. (B) Distribution of estimates from our methods, Phylofit, and the birth–death MCMC on 500 simulated ultrametric trees from each r value, where r is the growth rate. Simulated trees generated assuming a continuous time birth–death branching process with n =100 and T =40. For each tree, we calculate the ratio of external lengths to internal lengths. For each growth rate, we show the percentage of trees which have a ratio of external to internal lengths below the diagnostic value of 3. (C) Relative fractional error distribution for four methods from the same simulations shown in (A) before (dashed lines, iterations = 10000) and after (solid lines, iterations = 8922) the cutoff of 3 was applied. Inset shows significant reduction in overestimates due to the diagnostic cutoff. (D) Normalized root mean square error (RMSE) and coverage of 95% confidence intervals for four methods using same simulations shown in (A) and (C) before and after the diagnostic cutoff was applied. The diagnostic provides a significant reduction in error and improvement in accuracy of 95% confidence intervals.
Figure 5.
Figure 5.
Applying estimates to blood data. (A) Averaged site frequency spectrum across 42 clones shows agreement with the 1k(k1) neutral expectation (solid line). Error bars show 95% confidence interval of the mean. (B–C) Our estimates and Phylofit for clones with n10 tips from individuals with (B) and without (C) myeloproliferative neoplasms (MPN) shows good agreement across methods. Brackets in (B) group estimates from the same clones in the same patient estimated from two distinct samples taken years apart, showing consistency of estimates. Note that we also include estimates from Van Egeren et al. (2021) in dark blue for the two clones from their dataset. (D) Correlation between our maximum likelihood estimate and estimates from Phylofit for all clones from (B) and (C). (E) Mean maximum likelihood net growth rate estimate for clones from patients with and without a diagnosis of MPN shows that more aggressive expansions are associated with MPN. (F) Maximum likelihood net growth rate estimate for clones with single or unknown drivers and multiple drivers show that fitness predicted by our methods is consistent with effects of known drivers. Non-parametric Mann–Whitney test used for P-value calculation in (E, F). (G) In the single most aggressive clone from each patient diagnosed with MPN, stratification by mean net growth rate r¯ shows significant differences in Kaplan–Meier survival curves from clone initiation to MPN diagnosis (log-rank test P = 0.0026) though sample set was small (13 patients). At time of sampling, mean age of high growth rate group was 60.3 years, median was 50.4 years. Mean age of low growth rate group was 60.9 years, median was 63 years.
Figure 6.
Figure 6.
Longitudinal validation. (A–C) Logistic fit to longitudinal data for three clones which have both single-cell and longitudinal data. Only longitudinal bulk WGS data were used for fitting. Single-cell colony clonal fraction (divided by 2) and 95% confidence intervals are shown in orange. Source for (A) is Williams et al. (2022), and source for (B) and (C) is Fabre et al. (2022). (D) Longitudinal and single-cell estimates for each of the clones in (A–C) show agreement across data types. (E) Longitudinal and single-cell estimates for different clones sharing the same driver. (F) Clonal competition between a DNMT3A + CBL clone and a DNMT3A + JAK2 clone shown in the reconstructed phylogeny (i). Maximum likelihood single-cell estimate from each clone (ii) shows that the DNMT3A + JAK2 clone likely has higher fitness. Longitudinal data (iii) shows that the DNMT3A + JAK2 clone increases in VAF over time while the DNMT3A + CBL clone decreases, confirming that the DNMT3A + JAK2 clone has higher fitness, as predicted by our maximum likelihood estimate. All error bars represent 95% confidence intervals.

Similar articles

Cited by

References

    1. Abelson S, Collord G, Ng SW et al. Prediction of acute myeloid leukaemia risk in healthy individuals. Nature 2018;559:400–4. - PMC - PubMed
    1. Antle C, Klimko L, Harkness W. Confidence intervals for the parameters of the logistic distribution. Biometrika 1970;57:397–402.
    1. Bolton KL, Ptashkin RN, Gao T et al. Cancer therapy shapes the fitness landscape of clonal hematopoiesis. Nat Genet 2020;52:1219–26. - PMC - PubMed
    1. Boskova V, Bonhoeffer S, Stadler T. Inference of epidemiological dynamics based on simulated phylogenies using birth-death and coalescent models. PLoS Comput Biol 2014;10:e1003913. - PMC - PubMed
    1. Bouckaert R, Vaughan TG, Barido-Sottani J et al. Beast 2.5: an advanced software platform for Bayesian evolutionary analysis. PLoS Comput Biol 2019;15:e1006650. - PMC - PubMed

Publication types