Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec;55(12):2149-2159.
doi: 10.1038/s41588-023-01555-z. Epub 2023 Nov 6.

Genetics and epidemiology of mutational barcode-defined clonal hematopoiesis

Affiliations

Genetics and epidemiology of mutational barcode-defined clonal hematopoiesis

Simon N Stacey et al. Nat Genet. 2023 Dec.

Abstract

Clonal hematopoiesis (CH) arises when a substantial proportion of mature blood cells is derived from a single hematopoietic stem cell lineage. Using whole-genome sequencing of 45,510 Icelandic and 130,709 UK Biobank participants combined with a mutational barcode method, we identified 16,306 people with CH. Prevalence approaches 50% in elderly participants. Smoking demonstrates a dosage-dependent impact on risk of CH. CH associates with several smoking-related diseases. Contrary to published claims, we find no evidence that CH is associated with cardiovascular disease. We provide evidence that CH is driven by genes that are commonly mutated in myeloid neoplasia and implicate several new driver genes. The presence and nature of a driver mutation alters the risk profile for hematological disorders. Nevertheless, most CH cases have no known driver mutations. A CH genome-wide association study identified 25 loci, including 19 not implicated previously in CH. Splicing, protein and expression quantitative trait loci were identified for CD164 and TCL1A.

PubMed Disclaimer

Conflict of interest statement

All deCODE authors are employees of the biotechnology company deCODE genetics/Amgen. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Association of mosaic somatic mutations with CH.
a, Results (−log10(P)) of gene-based burden test using SKAT-O for association of somatic mutations with CH. Data are a meta-analysis of ISL and UKB. Separate burden tests were conducted to include high-impact (red) or moderate-impact mutations (green; as assessed with the Ensembl VEP) and a combination of both types (blue) for the genes indicated. Pcomb is the P value for combined high- and moderate-impact variants. The maximum impact (MaxImpact) VEP annotation was used to classify each mutation. b, Lollipop plot showing the counts of somatic mutations in the MTA2 gene detected in CH cases in UKB. Green lollipops are missense, black are frameshifts and orange are splice mutations. PFAM domain and exon structures are shown below. BAH, bromo-adjacent homology domain; ELM2, Egl-27 and MTA1 homology 2 domain; GATA, GATA zinc finger domain; MTA_R1, metastasis-associated protein MTA1 R1 domain; SANT, Swi3-Ada2-N-Cor and TFIIIB domain. c, Fisher’s exact association test results in UKB for individual mutations in MTA2. Diameter of the circles indicates the total number of participants with the mutation (CH cases + controls). SwissProt domains and exon structure of the gene are shown below. d,e, As in b and c but for the CALR gene. FE, Fisher’s exact.
Fig. 2
Fig. 2. Differential risks of subsequent hematological disorders for barcode-CH, CPLD-CH and CPLDneg-CH.
a, HR and 95% CI from Cox regressions for subtypes of hematological disorder, stratified by CPLD-CH, barcode-CH and CPLDneg-CH. Diagnoses were included if they arose 6 months or more after blood sampling for CH determination. Data are meta-analysis of UKB and ISL (n = 162,963 participants overall, 14,837 with barcode-CH, 5,288 with CPLD-CH and 11,692 with CPLDneg-CH). b, HR and 95% CI for subsequent hematological disorder stratified by CPLD genes. MM, multiple myeloma; MGUS, monoclonal gammopathy of undetermined significance; OMF, osteomyelofibrosis.
Fig. 3
Fig. 3. GWAS meta-analysis of barcode-CH in ISL and UKB.
Manhattan plot showing logistic regression GWAS results (−log10(P) versus chromosomal position) from 16,306 cases and 159,913 controls. The horizontal red line corresponds to a P value of 5 × 10−8. Named loci have unconditional P values of <5 × 10−8. Loci are named by the nearest gene or plausible candidate. The TERT and TCL1A loci are offscale, and their P values are indicated on the plot. Detailed data for named loci are in Supplementary Table 9. Several high-effect, rare variants were deemed to require further confirmation and were not considered further (indicated in Supplementary Table 9).
Fig. 4
Fig. 4. Effects CH GWAS variants and LTL GWAS variants on CH, LTL and MPN outcomes.
a, Effects of CH GWAS variants on CH (x axis) and LTL (y axis) outcomes. LTL data are from UKB (n = 418,251). The two discordant TERT variants mentioned in the text are indicated. b, Effects of LTL GWAS variants on LTL (x axis) and CH (y axis) outcomes. Variants are grouped into ‘cloud 1’ (shaded brown) and ‘cloud 2’ (shaded blue) according to their direction of effect on CH (see text). c, Effects of CH GWAS variants on CH (x axis) and MPN (y axis) outcomes. MPN outcomes were obtained from meta-analysis of ISL and UKB data (ncase = 1,124 and ncontrol = 747,154). In all panels, only variants with MAF > 1% are plotted. The plotted points are association effect estimates from logistic/linear regression and the bars indicate 95% CI. The red dotted lines indicate the IVW regressions. The chromosomal location of each plotted variant is indicated by color as indicated in the color key, lower right.
Fig. 5
Fig. 5. CH GWAS variants are associated with splicing and expression of CD164.
a, Splice diagram of the two major CD164 mRNA isoforms from whole blood RNA-seq data. Blue bars depict exons and are wider in coding regions. Introns are depicted as black arrowed lines. The sQTL affects skipping or inclusion of exon 5. Effects (β in s.d. units) from linear regression of the CH risk rs3056655_A allele are as follows: E4 to E6 (β = 0.44, P = 3.04 × 10−302; E4 to E5 (β = −0.22, P = 3.29 × 10−72); E5 to E6 (β = −0.14, P = 4.16 × 10−32). Thickness of the arcs indicates the overall usage of the different splice junctions. Black arcs indicate a reduction in usage in association with rs3056655_A, while the brown arc indicates an increase. b, Colocalization plot of the CD164 locus showing association from logistic/linear regression of rs3056655 with CH (blue) and with the E4 to E6 splice event in whole blood (red, −log10(P) is divided by 40 for scaling). c, RNA-seq coverage plot of CD164 from 822 CD8+ cytotoxic T cell samples, stratified by rs3056655 allele, showing reduced levels of expression in rs3056655_A (CH at-risk) heterozygotes and homozygotes. Note that rs3056655 is multi-allelic, but only the rs3056655_A (CH at-risk) and _G (CH protective) alleles were seen in the RNA-seq samples. d, As c, but RNA-seq from 899 monocyte samples.
Fig. 6
Fig. 6. CH risk variants, pQTL and eQTL at the TCL1A locus.
a, Locus zoom of CH GWAS results at TCL1A. b, Cis-pQTL analysis of variants affecting plasma protein levels of TCL1A in 47,133 UKB participants. c, As b, but from 35,559 ISL participants. d, RNA-seq cis-eQTL analysis of TCL1A in whole blood. e, Colocalization analysis of CH GWAS and blood eQTL signals at the TCL1A locus. The CH GWAS (green) and unadjusted eQTL signals (red) do not coincide. However, when the eQTL signal is adjusted for the 4% MAF rs78986913 variant (Padj values shown in blue), then the peaks overlap with a PP.H4 = 85% probability that they correspond to the same signal. The position of the CH GWAS sentinel variant rs2887399 is indicated by the gray vertical line. f, TCL1A eQTL from 758 B cell RNA samples. g, TCL1A eQTL from 884 monocyte samples. In all panels except e, the r2 focus is on rs2887399.
Extended Data Fig. 1
Extended Data Fig. 1. Age and smoking dependency of CH.
a, Frequency distribution in UKB of singleton mutations: Mutations that were observed only once in the cohort were plotted by variant allele fraction (VAF). The counts were further stratified by the age of the subject at blood draw. Note that there is a ‘bump’ in the distribution starting below a VAF of approximately 0.3 and that the size of this ‘bump’ is age dependent. This distribution was modeled to identify people with more than the expected number of low-VAF mutations, as explained further in the Methods. b, Proportion of subjects with CH increases with age. The line connects the observed CH proportions, error bars are 95%CI. Data are from the ISL sample (n = 45,510), which has a larger age range than UKB. c, Effects of current and previous smoking on CH by age: CH was modeled by age and stratified by current or previous smoking status using sex, Pack-Years and Years Since Stopped Smoking as covariates. Points correspond to observed CH proportions and error bars are 95%CI. Lines correspond to a logistic regression fit. Data are from the UKB sample (n = 130,709).
Extended Data Fig. 2
Extended Data Fig. 2. Only a minority of CH cases have a known CPLD mutation.
The proportion of subjects with barcode-CH by age is shown in blue. Proportions of subjects where a CPLD mutation had been identified (CPLD-CH) are in green and the proportion with a mutation in DNMT3A or TET2 are in magenta. CPLD mutations were defined as in ref. . The lines indicate a data fit using a generalized additive model with cubic splines. Shading indicates 95%CI. a, Data from UKB. b, Data from ISL.
Extended Data Fig. 3
Extended Data Fig. 3. Locus zoom plots for loci where a secondary signal was detected by conditional analysis.
Plots show conditional logistic regression GWAS results (−log10P vs chromosomal position) from 16,306 cases and 159,913 controls. The adjusted signals are shown, with the primary signal in the upper part of each panel and the secondary signal in the lower part. r2 values relative to the peak signal are shown by color as indicated in the color bar, bottom right. a, SMC4 locus. b, TERT locus. c, NRIP1 locus.
Extended Data Fig. 4
Extended Data Fig. 4. GWAS of CPLDneg-CH and comparison of effects with barcode-CH GWAS.
Data are a meta-analysis of ISL and UKB. GWAS variants were included if they were significantly associated with barcode-CH or CPLDneg-CH. The plotted points are association effect estimates (loge odds ratio) and 95%CI from logistic regression association testing for variants in barcode-CH (16,306 cases, 159,913 controls) and CPLDneg-CH (11,692 cases, 151,277 controls) respectively. The fitted inverse variance weighted linear regression, fixed through the origin, is shown as a red dotted line. Variants that were newly discovered in the CPLDneg-CH GWAS are colored green. Labeled loci are discussed in the text.
Extended Data Fig. 5
Extended Data Fig. 5. Effects of GWAS meta-analysis variants on various types of CPLD-CH vs barcode-CH.
GWAS variants were included if they were significantly associated with barcode-CH or any of the CPLD-CH types. The x-axes show the effects (loge odds ratio) and 95%CI (horizontal lines) for each variant in barcode-CH, determined by logistic regression. The y-axes show the corresponding effects and 95%CI (vertical lines) for each variant in the different types of CPLD-CH, as indicated above each panel. The dotted line shows the position of the diagonal. Gray lines indicate the position of no effect. Detailed data including case and control numbers are in Supplementary Table 12. The chr14:TCL1A rs2887399_T allele was protective against barcode-CH, TET2-CH and ASXL1-CH whilst the same allele increased risk of DNMT3A-CH, in line with previous reports. The chr14:TCL1A variant is indicated in the DNMT3A-CH and ASXL1-CH panels to illustrate the reversal of effect. Similarly, the chr6:CD164 chr6:CD164 rs3056655_A allele increased risk of barcode-CH and DNMT3A-CH but decreased risk of TET2-CH,. The latter result was seen only in UKB, whereas ISL data could not confirm it. The chr3:SMC4 rs201009932 variant had no discernible effect on ASXL1-CH while it had a pronounced effect on JAK2-CH. chr3:THRB had no apparent effect on DNMT3A-CH and chr5:TERT rs7705526 had no effect on PPM1D-CH. Other variants showed prominent effects only in specific CPLD-CH types: chr12:SOX5 and chr14:DLK1 had no evident effects outside of barcode-CH, while chr13:KLF12 had no apparent effect outside of PPM1D-CH. The chr9:JAK2 rs16922785_G allele (indicated in the JAK2-CH panel) only conferred CH risk in the context of the JAK2 Val617Phe somatic mutation and was preferentially linked to it in cis, as has been noted previously for the 46/1 JAK2 haplotype and MPN risk. rs16922785 is in moderate LD with the 46/1 haplotype (r2 = 0.68) and had a somewhat stronger association with JAK2-CH than the 46/1 haplotype tagger rs12343867_C (P = 1.60 × 10−9 vs 1.04 × 10−7).
Extended Data Fig. 6
Extended Data Fig. 6. Effects of CH GWAS variants on clinical hematology parameters.
a, GWAS Catalog reports: For each sentinel CH GWAS variant, we identified all variants in LD with r2 > = 0.8 within +/−500kb. For those variants, we searched the GWAS Catalog for reported associations with P-values < 1 × 10−7 from linear regression association. CH GWAS loci (y-axis) are colored red if the Alt allele increased CH risk, otherwise blue. Circles are colored red if the Alt allele was associated with an increase in the hematological trait value (x-axis), blue if there was a decrease and gray if the direction of effect could not be ascertained. b, Associations from linear regression between sentinel CH GWAS variants and clinical hematology traits measured on contemporaneous samples in the UKB: CH GWAS loci (y-axis) are colored red if the Alt allele increased CH risk, otherwise blue. Hematological trait symbols (x-axis) are colored red if their values increased in association with the CH phenotype, blue if they decreased in CH and gray if they were not associated with CH. Blocks are colored in if the effect of the CH GWAS variant on the trait was at least nominally significant: red indicates that the Alt allele was associated with an increase in the hematological trait value, blue indicates a decrease. Intensity of color indicates the effect size. Hematological traits are ordered by hierarchical clustering within the CH at-risk and CH protective strata. Platelet parameters were affected by the greatest number of variants: PCT, PLT, PDW and MPV; followed by erythrocytic parameters MCH, RBC and MCV. The best alignments in direction of effects (that is where the effects of the variant on CH and the hematological trait were consistent with the phenotype:phenotype association) were seen again for platelet parameters PDW, PCT and PLT as well as for MO#, LY# and BA%. From the perspective of the CH GWAS variants, the variants affecting the most hematological traits were chr6:CD164 and chr6:HLA-C. However chr6:CD164 had rather poor alignment in the direction of effects. The best alignments were seen for chr21:14966851 NRIP1, chr3:THRB and chr3:16068930:SMC4. Clinical hematology parameters are as defined in Sheard.
Extended Data Fig. 7
Extended Data Fig. 7. Effects leukocyte telomere length (LTL) GWAS variants on LTL in UKB and in a UKB sub-sample with barcode-CH cases removed.
A GWAS was conducted on a sub-sample of UKB from which proven CH cases had been removed (n = 111,523). The effects of LTL GWAS variants were compared between the two samples: LTL effect on the x-axis and the no CH LTL effect on the y-axis. The plotted points are association the effect estimates from linear regression and the bars indicate 95%CI. The red dotted line indicates the fitted inverse variance weighted (IVW) regression. Gray lines indicate the position of no effect.
Extended Data Fig. 8
Extended Data Fig. 8. Co-localization of eQTL with CH GWAS loci chr3q27:ABCC5 and chr3q25:TRIM59/SMC4.
a, Public databases report that ABCC5 expression is down regulated in association with the CH risk allele chr3:183954156_GT in whole blood, monocytes and T-cells. This eQTL was confirmed in ISL whole blood RNAseq (β = −0.926 sd, P = 1 × 10−1657). We noted a closely correlated, moderate impact splice region variant (rs7636910, r2 = 0.96) in ABCC5. The panel shows a plot of RNAseq eQTL signals from whole blood (red) and CH GWAS results (blue) by genomic location. eQTL P-values are scaled as indicated in the legend. Co-localization analysis (COLOC) indicated a PP.H4 = 74% probability that the eQTL and CH GWAS signals arise from the same, single causative variant. ABCC5 is, however, not a compelling biological candidate for CH causation. b, Public databases report that TRIM59 and SMC4 expression in blood is increased in association with CH risk allele rs2305407_A, which is annotated as an SMC4 splice region variant. These signals replicated in ISL blood RNAseq (TRIM59: β = 0.458sd, P = 1 × 10−420; SMC4: β = 0.073sd, P = 1.75 × 10−11). There were two independent CH GWAS signals at 3q25; a 1-2%EAF CH risk variant chr3_160368930_T_TA and a ∼ 55%EAF CH risk variant rs2305407_A, which carries the eQTL association. Accordingly, the CH GWAS plot (blue) shows the Padj values for rs2305407_A conditioned on chr3_160368930_T_TA. The TRIM59 RNAseq eQTL signal (red) is scaled as indicated in the legend. COLOC revealed a PP.H4 = 96% probability of peak identity. COLOC did not show substantial evidence of peak identity with the SMC4 eQTL, whether the CH GWAS signal was conditioned on chr3_160368930_T_TA or not, with PP.H4 = 4.5% and 2.2%, respectively. eQTL and CH GWAS signals were derived from linear and logistic regression association analysis, respectively.

Similar articles

  • Genome-wide analyses of 200,453 individuals yield new insights into the causes and consequences of clonal hematopoiesis.
    Kar SP, Quiros PM, Gu M, Jiang T, Mitchell J, Langdon R, Iyer V, Barcena C, Vijayabaskar MS, Fabre MA, Carter P, Petrovski S, Burgess S, Vassiliou GS. Kar SP, et al. Nat Genet. 2022 Aug;54(8):1155-1166. doi: 10.1038/s41588-022-01121-z. Epub 2022 Jul 14. Nat Genet. 2022. PMID: 35835912 Free PMC article.
  • Clonal hematopoiesis, with and without candidate driver mutations, is common in the elderly.
    Zink F, Stacey SN, Norddahl GL, Frigge ML, Magnusson OT, Jonsdottir I, Thorgeirsson TE, Sigurdsson A, Gudjonsson SA, Gudmundsson J, Jonasson JG, Tryggvadottir L, Jonsson T, Helgason A, Gylfason A, Sulem P, Rafnar T, Thorsteinsdottir U, Gudbjartsson DF, Masson G, Kong A, Stefansson K. Zink F, et al. Blood. 2017 Aug 10;130(6):742-752. doi: 10.1182/blood-2017-02-769869. Epub 2017 May 8. Blood. 2017. PMID: 28483762 Free PMC article.
  • Clonal Hematopoiesis: Impact on Health and Disease.
    Caiado F, Manz MG. Caiado F, et al. Hematol Oncol. 2025 Jun;43 Suppl 2(Suppl 2):e70075. doi: 10.1002/hon.70075. Hematol Oncol. 2025. PMID: 40517440 Free PMC article. Review.
  • Aberrant activation of TCL1A promotes stem cell expansion in clonal haematopoiesis.
    Weinstock JS, Gopakumar J, Burugula BB, Uddin MM, Jahn N, Belk JA, Bouzid H, Daniel B, Miao Z, Ly N, Mack TM, Luna SE, Prothro KP, Mitchell SR, Laurie CA, Broome JG, Taylor KD, Guo X, Sinner MF, von Falkenhausen AS, Kääb S, Shuldiner AR, O'Connell JR, Lewis JP, Boerwinkle E, Barnes KC, Chami N, Kenny EE, Loos RJF, Fornage M, Hou L, Lloyd-Jones DM, Redline S, Cade BE, Psaty BM, Bis JC, Brody JA, Silverman EK, Yun JH, Qiao D, Palmer ND, Freedman BI, Bowden DW, Cho MH, DeMeo DL, Vasan RS, Yanek LR, Becker LC, Kardia SLR, Peyser PA, He J, Rienstra M, Van der Harst P, Kaplan R, Heckbert SR, Smith NL, Wiggins KL, Arnett DK, Irvin MR, Tiwari H, Cutler MJ, Knight S, Muhlestein JB, Correa A, Raffield LM, Gao Y, de Andrade M, Rotter JI, Rich SS, Tracy RP, Konkle BA, Johnsen JM, Wheeler MM, Smith JG, Melander O, Nilsson PM, Custer BS, Duggirala R, Curran JE, Blangero J, McGarvey S, Williams LK, Xiao S, Yang M, Gu CC, Chen YI, Lee WJ, Marcus GM, Kane JP, Pullinger CR, Shoemaker MB, Darbar D, Roden DM, Albert C, Kooperberg C, Zhou Y, Manson JE, Desai P, Johnson AD, Mathias RA; NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium; Blackwell TW, Abecasis GR, Smith AV, Kang HM, Satpathy A… See abstract for full author list ➔ Weinstock JS, et al. Nature. 2023 Apr;616(7958):755-763. doi: 10.1038/s41586-023-05806-1. Epub 2023 Apr 12. Nature. 2023. PMID: 37046083 Free PMC article.
  • Advances in understanding the molecular basis of clonal hematopoiesis.
    Alagpulinsa DA, Toribio MP, Alhallak I, Shmookler Reis RJ. Alagpulinsa DA, et al. Trends Mol Med. 2022 May;28(5):360-377. doi: 10.1016/j.molmed.2022.03.002. Epub 2022 Mar 25. Trends Mol Med. 2022. PMID: 35341686 Review.

Cited by

References

    1. Zink F, et al. Clonal hematopoiesis, with and without candidate driver mutations, is common in the elderly. Blood. 2017;130:742–752. doi: 10.1182/blood-2017-02-769869. - DOI - PMC - PubMed
    1. Mitchell E, et al. Clonal dynamics of haematopoiesis across the human lifespan. Nature. 2022;606:343–350. doi: 10.1038/s41586-022-04786-y. - DOI - PMC - PubMed
    1. Genovese G, et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N. Engl. J. Med. 2014;371:2477–2487. doi: 10.1056/NEJMoa1409405. - DOI - PMC - PubMed
    1. Jaiswal S, et al. Age-related clonal hematopoiesis associated with adverse outcomes. N. Engl. J. Med. 2014;371:2488–2498. doi: 10.1056/NEJMoa1408617. - DOI - PMC - PubMed
    1. Jaiswal S, Libby P. Clonal haematopoiesis: connecting ageing and inflammation in cardiovascular disease. Nat. Rev. Cardiol. 2020;17:137–144. doi: 10.1038/s41569-019-0247-5. - DOI - PMC - PubMed