Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug 13;12(1):4921.
doi: 10.1038/s41467-021-25172-8.

Interacting evolutionary pressures drive mutation dynamics and health outcomes in aging blood

Affiliations

Interacting evolutionary pressures drive mutation dynamics and health outcomes in aging blood

Kimberly Skead et al. Nat Commun. .

Abstract

Age-related clonal hematopoiesis (ARCH) is characterized by age-associated accumulation of somatic mutations in hematopoietic stem cells (HSCs) or their pluripotent descendants. HSCs harboring driver mutations will be positively selected and cells carrying these mutations will rise in frequency. While ARCH is a known risk factor for blood malignancies, such as Acute Myeloid Leukemia (AML), why some people who harbor ARCH driver mutations do not progress to AML remains unclear. Here, we model the interaction of positive and negative selection in deeply sequenced blood samples from individuals who subsequently progressed to AML, compared to healthy controls, using deep learning and population genetics. Our modeling allows us to discriminate amongst evolutionary classes with high accuracy and captures signatures of purifying selection in most individuals. Purifying selection, acting on benign or mildly damaging passenger mutations, appears to play a critical role in preventing disease-predisposing clones from rising to dominance and is associated with longer disease-free survival. Through exploring a range of evolutionary models, we show how different classes of selection shape clonal dynamics and health outcomes thus enabling us to better identify individuals at a high risk of malignancy.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Deep learning models can be used to discriminate amongst the evolutionary pressures shaping blood evolution.
a The impact of selection and genetic drift on shaping clonal dynamics. Cells accumulate somatic mutations in each division. The majority of mutations will be either neutral (blue) or mildly damaging (red). Driver mutations will increase the fitness of a cell and increase the frequency in the population (green). However, mutations are also able to rise in frequency through genetic drift. b Mutation summary statistics extracted from blood cell populations. Summary statistics fall into three categories: (1) Counts of mutations in each blood sample (overall and stratified according to mutation type (silent and missense) across variant allele frequency intervals, (2) The frequency of mutations (variant allele frequency), and (3) mutation annotation and respective ratios (proportion of missense relative to total missense sites over the proportion of silent mutation relative to total silent sites). A total of 16 summary statistics are extracted from each population. c Deep Neural Network Architecture. Each DNN was trained as a multi-task neural network and classifies a population into one of four overarching evolutionary classes and predicts four continuous parameters. Each neural network consisted of an input layer (16 units with each unit corresponding to a summary statistic), three hidden layers (512 units), and five output layers which included the classification output (four units) and four regression outputs (one unit each). d DNN Ensemble. We trained a total of ten deep neural networks (DNNs) independently, yet with identical architecture. Through employing an ensemble-based approach, we are able to obtain a distribution of predictions for each population. e Classification performance for simulated evolutionary classes. The y-axis represents the true evolutionary class, and the x-axis represents the predicted evolutionary class. Classification accuracy ranges from blue (low accuracy) to red (high accuracy). We obtain a high classification accuracy across evolutionary classes (94.8%). Positive and combination classes are predicted with 99.7% and 97.4%, respectively. We observe a reduction in accuracy in neutral (80.6%) and negative (83.4%) classes of evolution.
Fig. 2
Fig. 2. Hematopoietic evolution is governed by a range of evolutionary dynamics.
a Evolutionary classes in preleukemic (red) and healthy (blue) blood populations. The majority of blood populations do not evolve neutrally (72%). Similarly, only 9% of individuals fit positive models of evolution. Populations do not evolve neutrally in the majority of preleukemic cases (79%) and healthy controls (64%). The majority of preleukemic (62%) and the plurality of healthy (43%) individuals fit combination (both beneficial and damaging mutations arising) classes of evolution. b Age-associations across evolutionary class predictions. Participants were stratified into 10-year age windows. Age intervals range from 30–40 (light blue) to 70–80 (dark blue). Each bar represents the proportion of individuals within each age bin fitting each evolutionary class for preleukemic individuals (n = 92) and healthy controls (n = 385). Standard errors for each proportion were calculated by p(1−p)/n where p is the proportion of individuals fitting a particular class and n is the total population. We observe significant differences in the proportion of individuals fitting combination classes of evolution in the 50–60 age range (Pearson’s chi-squared test, X2 = 4.54, p-value = 0.03), the 60–70 age range (Pearson’s chi-squared test, X2 = 10.55, p-value = 0.001), and in the neutral class of evolution in the 60–70 age range (Pearson’s chi-squared test, X2 = 4.73, p-value = 0.03). c Range of mutation rate estimations across a cohort of participants. We show the estimated mutation rate for each sample from each DNN in our ensemble (gray). The mean estimate from the classifier outputs is shown in red and samples are sorted by the mean estimated mutation rate. Mutation rates (y-axis) are log-transformed and scaled to a population size of 10,000. d Preleukemic blood populations have a higher mutation rate than healthy controls. Each boxplot illustrates the distribution of estimated mutation rates across samples grouped according to outcome status (control (n = 385): blue, preleukemic (n = 92) = red), the midline represents the medians, the upper and lower bounds the interquartile ranges, and the whiskers extend to 1.5 times the interquartile range. The level of significance is indicated as follows: ns: p > 0.0, *p-value <= 0.05, **p-value <= 0.01, ***p-value <= 0.001, ****p-value <= 0.0001. Preleukemic cases are found to have a modest yet significantly higher mutation rate than controls (Two-sided Wilcoxon rank-sum test, W = 14336, df = 1, p-value = 0.004). e Relative passenger to driver mutation proportion across evolutionary classes. The number of mutations in known driver genes is plotted against the number of mutations in non-driver genes for each individual blood population with healthy controls shown in blue and preleukemic individuals in red. We used linear regression to compare the relationship between the number of mutations falling into known driver genes versus non-driver genes in cases (red) and controls (blue) fitting combination and positive evolutionary classes. The 95% confidence level interval for predictions from each linear model is indicated in gray. In the combination model, we find a significant interaction between the number of mutations occurring in non-driver genes compared to driver genes in controls (β = 5.76) and cases (β = 0.642); F (1, 224) = 28.5, p-value = 2.23e−07. However, in the positive class, we did not find a significant interaction between the number of mutations occurring in non-driver genes compared to a driver in controls (β = 0.16) and cases (β = 0.17); F(1,12) = 0.0004, p-value = 0.98.
Fig. 3
Fig. 3. Distinct patterns of inferred pathogenicity and clonal dynamics are associated with evolutionary class predictions.
a Predicted functionality of mutations in each evolutionary class. Average CADD scores were calculated for mutations in known driver genes (green) and non-driver genes (red) and are presented as mean values +/− SEM. The level of significance is indicated as follows: ns: p > 0.0, *p-value <= 0.05, **p-value <= 0.01, ***p-value <= 0.001, ****p-value <= 0.0001. We capture a significant (Two-sided Wilcoxon rank-sum test, positive models: W = 548, p-value = 0.001; combination models: W = 278136, p-value < 2.2e−16) enrichment of high CADD scores in driver genes compared to non-driver genes. We do not observe a significant difference between CADD scores across mutations in driver genes and non-driver genes in neutral classes (Two-sided Wilcoxon rank-sum test, W= 7598.5, p-value = 0.3), The average CADD score assigned to passenger mutations in negative models is significantly lower (Two-sided Wilcoxon rank-sum test, W = 39298, p-value = 0.004) than passenger mutation CADD scores in the neutral class. b Distribution of function-altering mutations in genes across evolutionary classes. The UpSet plot shows the distribution of function-altering mutations (CADD > 10) falling in genes across patients in different evolutionary classes. The total number of genes mutated in each evolutionary class is shown on the left (positive = green, negative = red, combination = orange, neutral = blue). The dark circles indicated classes with overlapping genes and the connecting bar indicated multiple overlapping genes. c Inferred pathogenicity of the dominant clones is correlated with the variant allele frequency across different evolutionary classes. We isolated the dominant clone within each individual blood pool. The CADD scores of each clone were binned into intervals of 10 and each boxplot illustrates the distribution of variant allele frequencies for each interval, the midline represents the medians, the upper and lower bounds the interquartile ranges, and the whiskers extend to 1.5 times the interquartile range. Plots are faceted according to evolutionary class (positive (n = 30): green, negative (n = 65): red, combination (n = 229): orange, neutral (n = 158): blue). We observe a wider distribution of CADD scores in neutral and combination models. In positive classes of evolution, clones are found at a higher frequency suggesting that they sweep to fixation. Similarly, in negative models of evolution, we observe a surplus of clones in the low CADD score bins which appear to segregate at a lower frequency. d Impact of Negative Selection on Clonal Expansions. We investigated if negative selection acting on passenger mutations impacted clonal expansions in a healthy and preleukemic context. To do so, we plotted the log-transformed variant allele frequency (VAF) of mutations found in cases and controls predicted to be evolving in positive class (green) and combination (orange) classes. Each boxplot illustrates the distribution of log10-transformed VAFs for each group, the midline represents the medians, the upper and lower bounds the interquartile ranges, and the whiskers extend to 1.5 times the interquartile range. The evolutionary class is denoted on the x-axis (combination/positive) and preleukemic/control status is indicated by a (PL) or (C), respectively. The level of significance is indicated as described above. VAF distributions were plotted separately for preleukemic individuals (light green/ light orange) and healthy controls (dark green/dark orange). We find that we are not able to discriminate between VAF distributions of mutations in healthy and preleukemic individuals in the positive class. Further, we are not able to discriminate between positive models of evolution and preleukemic individuals who fit combination models of evolution. However, we find that clones in controls fitting combination classes of evolution have a significantly lower VAF distribution compared to both preleukemic cases fitting combination models (Two-sided Wilcoxon rank-sum test, W= 254988, p-value < 2.2e−16, ncombination(PL) = 403, ncombination(C) = 1095) and clones in controls fitting positive models (Two-sided Wilcoxon rank-sum test, W = 25180, p-value = 0.02, npositive(c) = 34).
Fig. 4
Fig. 4. Negative selection is associated with AML-free survival.
a Kaplan–Meier curves of AML-free survival among EPIC participants. Survival is defined as the time between blood sample collection and diagnosis (for cases), for control survival is right-censored at last follow-up. Survival curves are stratified according to evolutionary class (positive = green, negative = red, combination = orange, neutral = blue). b Forest Plot of risk of AML development across evolutionary classes. Each row indicates hazard ratios associated with each evolutionary class (reference = neutral class). The horizontal lines indicate a 95% confidence interval for each class.

References

    1. Doulatov S, Notta F, Laurenti E, Dick JE. Hematopoiesis: a human perspective. Cell Stem Cell. 2012;10:120–136. doi: 10.1016/j.stem.2012.01.006. - DOI - PubMed
    1. Abkowitz JL, Catlin SN, McCallie MT, Guttorp P. Evidence that the number of hematopoietic stem cells per animal is conserved in mammals. Blood. 2002;100:2665–2667. doi: 10.1182/blood-2002-03-0822. - DOI - PubMed
    1. Lee-Six H, et al. Population dynamics of normal human blood inferred from somatic mutations. Nature. 2018;561:473–478. doi: 10.1038/s41586-018-0497-0. - DOI - PMC - PubMed
    1. Jaiswal S, et al. Age-related clonal hematopoiesis associated with adverse outcomes. N. Engl. J. Med. 2014;371:2488–2498. doi: 10.1056/NEJMoa1408617. - DOI - PMC - PubMed
    1. Genovese G, et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N. Engl. J. Med. 2014;371:2477–2487. doi: 10.1056/NEJMoa1409405. - DOI - PMC - PubMed

Publication types

MeSH terms