Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul;57(7):1718-1729.
doi: 10.1038/s41588-025-02217-y. Epub 2025 Jul 3.

Detecting and quantifying clonal selection in somatic stem cells

Affiliations

Detecting and quantifying clonal selection in somatic stem cells

Verena Körber et al. Nat Genet. 2025 Jul.

Abstract

As DNA variants accumulate in somatic stem cells, become selected or evolve neutrally, they may ultimately alter tissue function. When, and how, selection occurs in homeostatic tissues is incompletely understood. Here, we introduce SCIFER, a scalable method that identifies selection in an individual tissue, without requiring knowledge of the driver event. SCIFER also infers self-renewal and mutation dynamics of the tissue's stem cells, and the size and age of selected clones. Probing bulk whole-genome sequencing data of nonmalignant human bone marrow and brain, we detected pervasive selection in both tissues. Selected clones in hematopoiesis, with or without known drivers, were initiated uniformly across life. In the brain, we found pre-malignant clones with glioma-initiating mutations and clones without known drivers. In contrast to hematopoiesis, selected clones in the brain originated preferentially from childhood to young adulthood. SCIFER is broadly applicable to renewing somatic tissues to detect and quantify selection.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Population genetics model of drift and selection in homeostatic tissues.
a, Modeled processes and associated parameters in the model of drift. Stem cells either divide symmetrically with rate λ, or exit the stem cell compartment by differentiating (or dying), with rate δ. The stem cell count (N) increases in development (λ > δ) until reaching steady-state numbers (Nss) and remains constant during adulthood (N = Nss and λ = δ). On average, cells acquire μ neutral variants during each cell division. b, Schematic illustrating variant accumulation during development and subsequent homeostasis. c,d, Cumulative number of SSNVs versus VAF in development (c) and in adult life (d) (the scaling of the x axis is transformed to 1VAF to spread out low-frequency variants). eg, Simulated cumulative VAF distribution of SSNVs at selected ages between 0 and 100 years for 5 × 103 stem cells (λ = 5 per year, μ = 10 per division) (e); for 5 × 104 stem cells (λ = 5 per year, μ = 10 per division) (f); and for 5 × 105 stem cells (λ = 5 per year, μ = 10 per division) (g). h, Model of clonal selection. A selective driver event reduces the loss rate (differentiation or death) by a factor s, causing selective outgrowth of the mutant clone (red); the remaining parameters are defined in a. The VAF of the selected clone increases exponentially with the age at measurement, a. i, As b, but here an acquired driver mutation (D) causes selective outgrowth of the mutant clone (red). j, All variants in the selected clone’s cell of origin are inherited by its progeny and hence reach a high VAF during clonal expansion, reflected in a shoulder in the cumulative VAF distribution. k, Simulated cumulative VAF distributions when a driver mutation is acquired at different ages, and the SSNVs are measured 45 years later, when the clone has reached a size of 32%. In the simulation, the selected clone grows by 22% per year (s = 0.02, λ = 10 per year, μ = 1 per division, Nss = 25,000). l, Simulated cumulative VAF distributions measured at varying ages after a driver mutation was acquired at 20 years of age. As k, the selected clone grows by 22% per year (s = 0.02, λ = 10 per year, μ = 1 per division, Nss = 25,000).
Fig. 2
Fig. 2. Benchmarking SCIFER with simulated data.
a, ROCs quantifying the detection of clonal selection by SCIFER for different clone sizes (color-encoded). ROCs were generated by applying SCIFER to simulated data, generated with a stochastic birth–death process with (s = 0.02, corresponding to a selective advantage of 2% increase in birth versus death) or without (s = 0) selection of a clone initiated at 20 years of life (λ = 10 per year, μ = 1 per division, Nss = 25,000, and assuming sequencing with an average coverage of 90×). In total 63 cases, with selected clone sizes of VAF 0%, 2.5%, 5%, 7.5%, 10%, 25% and 37.5%, were generated. Models with, or without, clonal selection were fit to the data using ABC. True positives and false positives were evaluated for varying posterior probability thresholds of clonal selection. For selected clones with VAF ≥ 5%, the difference between true positives and false positives was maximal for a selection threshold of 15% (operating points, shown in red). b, Posterior probability for clonal selection (colored bars) and neutral evolution (gray bars) conditioned on selected clones with VAF ≥ 5% for six simulated cases with varying clone size. The dashed line marks the selection threshold at 15% conditional posterior probability. c, Accuracy of SCIFER to distinguish clonal selection from genetic drift in simulated WGS data. Shown are AUC computed from the ROCs shown in a and from ROCs obtained in analogy for simulated sequencing depths of 30× and 270×. The simulated data were generated as in a. For 270× sequencing depth, an additional 17 cases with selected clone sizes of 0.5% VAF and 1% VAF were used for model evaluation. d, Model scheme for two selected clones (red and orange) that compete with normal cells (blue). The two selected clones are born at times ts1 and ts2, and expand due to decreased loss rates (δs1 and δs2); the total cell count remains constant over time. e, Selection of two sequential clones manifests itself in two subclonal shoulders whose heights scale with the time points of driver acquisition.
Fig. 3
Fig. 3. Benchmarking SCIFER with published pseudo-bulk data.
a, Reconstructed single-cell phylogenies after re-calling SSNVs and indels from single-cell WGS data. b, Left, VAF distribution of SSNVs shown in a, truncated at 1%. Right, model fit to the cumulative 1VAF distribution (points and error bars, measured data and their standard deviation, which, assuming Poisson-distributed measurements, is the square root of the measured data; red area, 95% posterior probabilities of the model fit computed from simulations using 100 posterior samples). c, Posterior probability for neutral evolution for pseudo-bulk WGS data from ref. (labeled Lee-Six) and three samples from ref. . SCIFER was applied twice to the data from ref. , using the SSNV counts obtained with Caveman or with Mutect2 and Strelka. df, Inferred HSC number (d), division rate (e) and number of SSNVs per division (f) for the cases shown in c (median and 80% credible intervals for each sample, estimated from 1,000 posterior samples; gray areas, 95% confidence band for the five estimates obtained with SCIFER). Estimates from ref. are given for comparison. g, Single-cell phylogeny of published sample KX004 (ref. ). h, As b, but for the sample shown in g (gray area in right panel, 80% credible interval of the estimated clone size computed from 1,000 posterior samples). i, Posterior probabilities for selection (conditioned on clones with VAF ≥ 5%) and neutral evolution for samples introduced in g and Extended Data Fig. 3i. Dashed line, 15% selection threshold. j, Age of leading selected clones in the cases shown in i, estimated by SCIFER and by phylodynamic modeling in the original publications (points, median; error bars, 80% credible intervals, estimated from 1,000 posterior samples). k,l, Estimated clonal growth rates (k) and stem cell parameters (l) for the samples shown in i (points, median; error bars, 80% credible intervals, estimated from 1,000 posterior samples). m, Single-cell phylogenies of published sample KX003 (ref. ). n, As h but for the sample shown in m. o, As i, but for sample KX003 (ref. ). p, Estimated stem cell and selection parameters for the sample shown in m. Shown are median and 80% credible intervals, estimated from 1,000 posterior samples.
Fig. 4
Fig. 4. Clonal selection for known CH drivers.
a, Model fit to the cumulative VAF distribution measured in CD34+ HSPCs of samples 7-T and 10-D with 270× WGS (points and error bars, measured data and their standard deviation, which, assuming Poisson-distributed measurements, is the square root of the measured data; purple area, 95% posterior probabilities of the model fit, estimated from simulations using 100 posterior samples; gray area, 80% credible interval of the clone size, estimated from 1,000 posterior samples; red points and error bars, mean and 95% confidence interval (CI) of the VAF of known CH drivers, based on binomial distributions with sample size and success probability values of 267 and 0.08, and 205 and 0.06, corresponding to read coverage and measured VAF in 7-T and 10-D, respectively). b, Model support for clonal selection (conditioned on clones ≥5% VAF) and neutral evolution based on 90× WGS data in 12 cases with selection and with at least one CH driver mutation in AXSL1, DNMT3A and TET2. Dashed line, 15% selection threshold. c, As b, but based on 270× bulk WGS data, where available (model support for selection conditioned on clones ≥2% VAF). d, Estimated sizes of the selected clones (median and 80% credible intervals, computed from 1,000 posterior samples, for 270× WGS, where available, and 90× WGS else) versus measured VAF of known CH driver (mean and 95% CI according to binomial distributions with sample size taken as read coverage and success probability taken as measured VAF; for the 13 mutations, read coverage and VAFs are as follows: 249 and 0.03, 246 and 0.04, 205 and 0.06, 194 and 0.08, 267 and 0.08, 127 and 0.09, 264 and 0.14, 242 and 0.15, 275 and 0.16, 272 and 0.18, 230 and 0.20, 270 and 0.26, and 260 and 0.28). e, As in a, but for sample 12-AT (selected clones in blue and green; 95% CIs of the VAFs of mutations in ASXL1 and TET2 based on binomial distributions with sample size and success probability of 242 and 0.15 and 246 and 0.04, respectively). f, As in b, but for a second selected clone. g, Estimated stem cell parameters for the cases shown in c, showing median and 80% credible intervals for each sample, based on 1,000 posterior samples. h, As in g, but showing the ratio between stem cell number and division rate. y, years.
Fig. 5
Fig. 5. Clonal selection for unknown CH drivers.
a, Cumulative 1VAF distributions for samples without a known CH driver at 90× or 270× WGS coverage. b, Model support for clonal selection (conditioned on clones with VAF ≥ 5%) and neutral evolution across samples introduced in a at 90× WGS. Dashed line, 15% selection threshold. c, As in b, but for the leading (left) or second selected clone (right) at 270× WGS, where available (posterior probabilities conditioned on clones with VAF ≥ 2%). d, Model fit to the cumulative 1VAF distribution for samples 1-N and 16-UU at 270× WGS (points and error bars, measured data and their standard deviation, which, assuming Poisson-distributed measurements, is the square root of the measured data; purple area, 95% posterior probabilities of the model fit computed from simulations using 100 posterior samples; blue and green areas, 80% credible intervals of the estimated sizes of the selected clones, computed from 1,000 posterior samples). e,f, Inferred ratio between HSC number and division rate (e) and stem cell parameters (f) for neutrally evolving samples. Shown are median and 80% credible intervals for each sample, computed from 1,000 posterior samples; estimates obtained from 90× and 270× WGS are shown side by side. g,h, as in e and f, but for samples with unknown drivers. i,j, Estimated number of newly acquired SSNVs per HSC division (i) and number of HSCs contributing to hematopoiesis (j) in the 4 neutrally evolving cases compared with the 12 cases with selection for a known CH driver (introduced in Fig. 4) and the 6 cases with selection for an unknown driver (introduced in this figure; points and error bars, median and 80% credible intervals for each sample, estimated from 1,000 posterior samples for 270× WGS data, where available, and 90× WGS data else; boxplots, median and interquartile range, whiskers extend to the largest and smallest value no further than 1.5 times the interquartile range). k, Left, as in j, but showing the estimated HSC division rate. Right, estimated division rate versus age at sampling. Gray, neutral evolution; red, clonal selection.
Fig. 6
Fig. 6. Selection dynamics with and without known drivers.
a, Estimated age of the selected clone (for the 12 cases introduced in Fig. 4 and the 6 cases introduced in Fig. 5; points, median; error bars, 80% credible intervals estimated from 1,000 posterior samples; parameters were estimated with the two-clone model from 270× WGS data, where available, and from 90× WGS data else; boxplots, median and interquartile range, whiskers extend to the largest and smallest value no further than 1.5× the interquartile range). b, Age of the second selected clone estimated with the two-clone model (for the ten cases introduced in Figs. 4 and 5; points, median; error bars, 80% credible intervals, estimated from 1,000 posterior samples; boxplots, median and interquartile range, whiskers extend to the largest and smallest value no further than 1.5× the interquartile range). c, Estimated growth rates for the 28 selected clones introduced in a and b (points, median; error bars, 80% credible intervals, estimated from 1,000 posterior samples; estimates were obtained with the two-clone model from 270× WGS data, where available, and from 90× WGS data else; boxplots, median and interquartile range, whiskers extend to the largest and smallest value no further than 1.5× the interquartile range). d, Cumulative distribution of estimated age at CH driver acquisition (median); shaded area, lower and upper bounds of the cumulative distribution of estimated age at CH driver acquisition based on 80% credible intervals of the estimated parameters; points and error bars, median and 80% credible intervals for the per-sample estimates, estimated from 1,000 posterior samples. Data are from the 28 selected clones introduced in a and b (estimates were obtained with the two-clone model from 270× WGS data, where available, and from 90× WGS data else).
Fig. 7
Fig. 7. Clonal selection in the human brain.
a, SCIFER fit to SSNVs measured in the hippocampus of LIBD82 (points and error bars, measured data and standard deviation, which, assuming Poisson-distributed measurements, is the square root of the measured data; purple area, 95% posterior probabilities of the model fit computed from simulations using 100 posterior samples; gray area, 80% credible interval of the clone size, estimated from 1,000 posterior samples; SCZ, schizophrenia). Red point, reported VAF of trisomy 7 and monosomy 10 (ref. ). b, As in a, but for cortical oligodendrocytes and striatal interneurons of NC7 (red points, mean VAF; red lines, 95% CI based on binomial distributions with sample size and success probability of 26 and 0.15 and 29 and 0.07 (NC7-CX-OLI); 30 and 0.1, 33 and 0.06, 32 and 0.22, and 32 and 0.16 (NC7-STR-INT), corresponding to read coverage and measured VAF, respectively). c, As in a, but for LIBD87 (cortex) and TS1 (striatum). d, Model support for clonal selection (conditioned on clones with ≥5% VAF for <150× WGS and ≥2% VAF for ≥150×) and neutral evolution across 185 brain samples from 131 individuals. Dashed line, 15% selection threshold. e, Left, average incidence of clonal selection versus age (summarizing ages into 10 year-bins; in total, 36 of 131 individuals with selection; line and shaded area, LOESS regression with 95% CI). Right, age of individuals with (n = 36) and without selection (n = 95; boxplots, median and interquartile range, whiskers, largest and smallest value no further than 1.5× the interquartile range; P = 0.00002662, Wilcoxon test statistic W = 2,525, two-sided Wilcoxon rank sum test). f, Median posterior stem cell parameter values by location (cortex, n = 128; hippocampus, n = 17; striatum, n = 40; boxplots, median and interquartile range, whiskers, largest and smallest value no further than 1.5× the interquartile range; points, outliers; P values, one-way analysis of variance with Kruskal–Wallis test). g, Estimated stem cell division rate versus age for 185 samples (median and 80% credible intervals computed from 1,000 posterior samples each; blue line and shaded area, LOESS regression with 95% CI). h, Cumulative distribution of estimated age at driver acquisition (median values of 44 samples with selection); lower and upper bounds based on 80% credible intervals (based on 1,000 posterior samples for each sample).
Extended Data Fig. 1
Extended Data Fig. 1. Quantifying selection and drift from bulk whole genome sequencing data.
a, Simulated cumulative variant allele frequency (VAF) distribution of somatic single nucleotide variants (SSNVs) at selected ages between 0 and 100 years for stem cell numbers ranging between 5,000 and 1 M; λ=5/year, μ=10/division. b, Simulated cumulative VAF distribution of SSNVs when a driver is acquired at 20 years of age, and the selected clone grows by 22% per year (s = 0.02, λ=10/year, μ=1/division, Nss = 25,000). Shown are the time points at which the clone sizes reached 5%, 10%, 15%, 20%, 25%, or 30% VAF for stem cell numbers ranging between 5,000 and 1 M. c, Simulated cumulative VAF distribution of somatic SSNVs when a driver is acquired at 20 years of age and has reached a clone size of 10% VAF (λ=10/year, μ=1/division, Nss = 25,000). Shown are the VAF distributions for selective advantages (s) ranging between 0.05 and 0.95, and stem cell numbers ranging between 5,000 and 1 M. d, Parameter estimation with approximate Bayesian computation. First, the expected variant allele frequency histogram is analytically computed. Then, sequencing noise is simulated by drawing from a binomial distribution with average 90x coverage. The modelled cumulative distribution is compared to the measured data for varying VAFs (1% step size).
Extended Data Fig. 2
Extended Data Fig. 2. Modeling two selected clones in stem cell homeostasis.
a, Schematic illustrating variant accumulation during development and subsequent homeostasis, with two selected clones (red and orange) emerging by branched evolution during homeostasis. b, Schematic illustrating variant accumulation during development and subsequent homeostasis, with two selected clones (red and orange) emerging by linear evolution during homeostasis. c, Simulated cumulative variant allele frequency (VAF) distribution of somatic single nucleotide variants (SSNVs) when two selected clones, founded at 30 years and 50 years, evolve in parallel. The selective advantage of the first clone is fixed at 0.018, while the selective advantages of the second clone varies between 0.033 and 0.038 (color-encoded; 25,000 stem cells; λ=10/year, μ=1/division; arrows highlight positions of the selected clones). d, Simulated cumulative VAF distribution of SSNVs when two selected clones, founded at 30 years and 50 years, evolve linearly. The selective advantage of the first clone is fixed at 0.018, while the selective advantages of the second clone varies between 0.041 and 0.035 (color-encoded; 25,000 stem cells; λ=10/year, μ=1/division; arrows highlight positions of the selected clones).
Extended Data Fig. 3
Extended Data Fig. 3. Quantifying drift and selection in published data.
a, Somatic variant calling pipeline (using hg19). SSNV, somatic single nucleotide variant. SCNA, somatic copy number abnormality. SSV, somatic structural variant. b, Number of shared and unique SSNVs in 140 whole genomes from hematopoietic stem cand progenitor cell (HSPC) clones from published data identified by Caveman (original publication) and our pipeline (shown in a). c, Number of SSNVs of different classes identified by each variant caller. d, Left, variant allele frequency (VAF) distribution (truncated at 0.01 VAF) of SSNVs identified by Caveman. Right, model fit to the cumulative 1/VAF distribution shown in left (points and error bars, measured data and their standard deviation, which, assuming Poisson-distributed measurements, is the square root of the measured data; red area, 95% posterior probabilities of the model fit, estimated from simulations using 100 posterior samples). e, Two-dimensional posterior probability distributions for the parameters estimated by SCIFER using SSNVs identified with Mutect2 and Strelka in published data. Axis limits, range of prior distributions. f, As in e, but using SSNVs identified with Caveman. Note that the posterior for the fraction of cell loss, δexp/λexp, is relatively broad, but has little effect on the mutation rate and other parameters (see Supplementary Note 2). g, As in d (left panel) but for samples AX001, KX001 and KX002 without clonal selection (SSNVs from published data). h, As in d (right panel) but for the data shown in g. i, Left, phylogenetic tree of published sample id2259 (reconstruction from the original publication). Middle, VAF distribution (truncated at 0.01 VAF) of SSNVs (taken from). Right, as in d, but for id2259 (grey area, 80% credible interval of the clone size, estimated from 1,000 posterior samples). j, Left, as in d but for sample KX004 when allowing for two selected subclones (based on the phylogeny, we assumed branched evolution, c.f. Figure 3g; blue and green areas, 80% credible intervals of the size of leading and second selected clone, estimated from 1,000 posterior samples). Right, posterior probability for clonal selection obtained with the 2-clone model (conditioned on clones with VAF ≥ 2%). k, Median and 80% credible interval (estimated from 1,000 posterior samples) for stem cell and selection parameters for KX004, obtained with a one-clone and two-clone model.
Extended Data Fig. 4
Extended Data Fig. 4. FACS gating for hematopoietic stem and progenitor cells.
a-f, Cells were stained with a panel of antibodies (Methods), then single and live mononuclear cells were sorted (MNCs, a-d). T cell depleted MNCs (MNC(–T)) were sorted from the CD3-, CD34+ cell fraction (e and f), and, hematopoietic stem and progenitor cells were sorted from the CD3-Lin-CD34+ cell fraction (f). Image created with FlowJo v10.8.1; values in a-f give percentages.
Extended Data Fig. 5
Extended Data Fig. 5. Somatic variants in hematopoietic stem and progenitor cells.
Genome-wide profile of trinucleotide substitution patterns (x-axis) (either from 270x whole genome sequencing (WGS) data, where available, or from 90x WGS data) across the genome in CD34+ hematopoietic stem and progenitor cells from the samples indicated.
Extended Data Fig. 6
Extended Data Fig. 6. Copy number variants in hematopoietic stem and progenitor cells.
a, Genome-wide (chromosomes indicated and arrayed across x-axis) profiles of copy number gains (red) and losses (blue) in CD34+ hematopoietic stem and progenitor cells (HSPCs) across the 10 samples without known clonal hematopoiesis (CH) driver. b, Genome-wide (chromosomes indicated and arrayed across x-axis) profiles of copy number gains (red) and losses (blue) in CD34+ HSPC across the 12 samples with known CH driver.
Extended Data Fig. 7
Extended Data Fig. 7. Quantifying clonal selection from bulk whole genome sequencing.
a, Model fits to cumulative variant allele frequency (VAF) distributions measured by 90x whole genome sequencing (WGS) of CD34+ hematopoietic stem and progenitor cells (HSPCs) from twelve individuals with known clonal hematopoiesis (CH) driver mutations (points and error bars, measured data and standard deviation, which, assuming Poisson-distributed measurements, is the square root of the measured data). Violet areas, 95% posterior probabilities of model fit, estimated from simulations using 100 posterior samples. Grey areas, 80% credible interval for the clone size, estimated from 1,000 posterior samples; red points and error bars, mean and 95% confidence interval of the VAF of known CH drivers, based on binomial distributions with sample size/success probability of 140/0.23 (5-DU), 168/0.15 (6-AU), 102/0.11(7-T), 89/0.18 (9-DU), 77/0.06 (10-D), 126/0.21(12-AT), 204/0.16 (17-TT), 230/0.13 (17-TT), 226/0.06 (17-TT), 70/0.09 (18-DU), 127/0.09 (19-D), 55/0.05 (20-UT), 156/0.39 (21-DU), 103/0.27 (22-TU), corresponding to read coverage and measured VAF, respectively. b, As in a, but showing an expanded view (VAF ≥ 0.2) for sample 21-DU (90x WGS). Selection (inferred at 7% VAF) was not associated with the DNMT3A mutation (43% VAF). At most 6 variants were acquired prior to the DNMT3A mutation, suggesting acquisition during early development. c, As in a, but for 270x WGS from seven individuals with known CH driver mutations. Colored rectangles, 80% credible interval for the clone size of leading and second selected clone, estimated from 1,000 posterior samples (where applicable); red points and error bars, mean and 95% confidence interval of the VAF of known CH drivers, based on binomial distributions with sample size/success probability of 270/0.26 (5-DU), 272/0.18 (6-AU), 230/0.2 (9-DU), 264/0.14 (17-TT), 275/0.16 (17-TT), 279/0.07 (17-TT), 194/0.08 (18-DU), 249/0.03 (20-UT), 260/0.28 (22-TU), corresponding to read coverage and measured VAF, respectively. d, Scheme for a model variant where a selected clone (red) expands without replacing normal cells (blue). e, As in a, but using the model variant introduced in d. f, Estimated model parameters obtained with SCIFER and the model variant introduced in c for three cases with a mutation in TET2 (points and error bars, median and 80% credible intervals, estimated from 1,000 posterior samples). g, Estimated model parameters obtained with the one-clone model and the two-clone model. Points and error bars, median and 80% credible estimates, estimated from 1,000 posterior samples; dashed line, bisectrix.
Extended Data Fig. 8
Extended Data Fig. 8. Quantifying neutral drift and clonal selection in cases without known driver from bulk whole genome sequencing.
a, Model fits to the cumulative variant allele frequency (VAF) distributions measured by 90x whole genome sequencing (WGS) of CD34+ hematopoietic stem and progenitor cell (HSPC) samples from ten individuals (1-N, 2-U, 3-N, 4-N, 8-UU, 11-UU, 13-U, 14-U, 15-N, 16-UU) without known clonal hematopoiesis (CH) driver mutation (points and error bars, mean and standard deviation of the measured data, where standard deviations were computed based on Poisson distributions with mean corresponding to the respective somatic single nucleotide variant (SSNV) count; purple areas show the 95% posterior probabilities of the model fit, estimated from simulations using 100 posterior samples). The model fits of 1-N, 4-N and 15-N show no evidence of selection; the model fits of 3-N shows weak evidence of selection, which is, however, invalidated when probed with 270x WGS (shown in b). b, Model fits to the cumulative VAF distributions measured by 270x WGS of CD34+ HSPC samples from seven individuals (3-N, 4-N, 8-UU, 11-UU, 13-U, 14-U, 15-N, 16-UU) without known CH driver mutation (points and error bars, mean and standard deviation of the measured data, where standard deviations were computed based on Poisson distributions with mean corresponding to the respective SSNV count; purple areas show the 95% posterior probabilities of the model fit, estimated from simulations using 100 posterior samples). Colored rectangles, 80% credible interval for the estimated clone size of the leading and second selected clone inferred with SCIFER (where applicable). The model fits show no evidence of selection for 3-N, 4-N and 15-N but evidence for one selected clone in 13-U and 14-U (grey rectangles) and for two selected clones in 8-UU and 11-UU (blue and green rectangles; no 270x WGS was available for 2-U).
Extended Data Fig. 9
Extended Data Fig. 9. Selection dynamics of leading and subsequent clones.
a, Estimated age of the leading selected clone (for the twelve cases with known clonal hematopoiesis (CH) drivers and the six cases with unknown drivers introduced in Figs. 4 and 5; points and error bars, median and 80% credible intervals, estimated from 1,000 posterior samples; parameter estimates are compared between model fits obtained from 90x and 270x whole genome sequencing (WGS) data). b, Estimated age of the second selected clone (for the ten cases introduced in Figs. 4 and 5; points and error bars, median and 80% credible intervals, estimated from 1,000 posterior samples). c, Estimated clonal growth rate of the leading selected clone (for the twelve cases with known drivers and the six cases with unknown drivers introduced in Figs. 4 and 5; points and error bars, median and 80% credible intervals, estimated from 1,000 posterior samples; parameter estimates are compared between model fits obtained from 90x and 270x WGS data). d, Estimated clonal growth rate of the second selected clone (for the ten cases introduced in Figs. 4 and 5; points and error bars, median and 80% credible intervals, estimated from 1,000 posterior samples).
Extended Data Fig. 10
Extended Data Fig. 10. Quantifying clonal selection in human brain samples.
a, Cohort characteristics, showing age, sex and phenotype of the profiled individuals, as well as location of the analyzed samples. b, Number of individuals with evidence for clonal selection or neutral evolution stratified by phenotype (left), sex (middle), and location (right). c, Estimated number of stem cells (left) and number of somatic single nucleotide variants (SSNVs) per division (right) plotted against age for the 185 analyzed brain samples (points and error bars, median and 80% credible intervals, estimated from 1,000 posterior samples; blue line and shaded area, LOESS regression and 95% confidence interval). d, Estimated age of the selected clone (left) and clonal growth rate (right) plotted against age for the 44 analyzed brain samples with evidence for clonal selection (points and error bars, median and 80% credible intervals, estimated from 1,000 posterior samples; blue line and shaded area, LOESS regression and 95% confidence interval).

References

    1. Spencer Chapman, M. et al. Lineage tracing of human development through somatic mutations. Nature595, 85–90 (2021). - PubMed
    1. Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature561, 473–478 (2018). - PMC - PubMed
    1. Lee-Six, H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature574, 532–537 (2019). - PubMed
    1. Brunner, S. F. et al. Somatic mutations and clonal dynamics in healthy and cirrhotic human liver. Nature574, 538–542 (2019). - PMC - PubMed
    1. Moore, L. et al. The mutational landscape of normal human endometrial epithelium. Nature580, 640–646 (2020). - PubMed