Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May 4;141(18):2214-2223.
doi: 10.1182/blood.2022018825.

A practical approach to curate clonal hematopoiesis of indeterminate potential in human genetic data sets

Affiliations

A practical approach to curate clonal hematopoiesis of indeterminate potential in human genetic data sets

Caitlyn Vlasschaert et al. Blood. .

Abstract

Clonal hematopoiesis of indeterminate potential (CHIP) is a common form of age-related somatic mosaicism that is associated with significant morbidity and mortality. CHIP mutations can be identified in peripheral blood samples that are sequenced using approaches that cover the whole genome, the whole exome, or targeted genetic regions; however, differentiating true CHIP mutations from sequencing artifacts and germ line variants is a considerable bioinformatic challenge. We present a stepwise method that combines filtering based on sequencing metrics, variant annotation, and population-based associations to increase the accuracy of CHIP calls. We apply this approach to ascertain CHIP in ∼550 000 individuals in the UK Biobank complete whole exome cohort and the All of Us Research Program initial whole genome release cohort. CHIP ascertainment on this scale unmasks recurrent artifactual variants and highlights the importance of specialized filtering approaches for several genes, including TET2 and ASXL1. We show how small changes in filtering parameters can considerably increase CHIP misclassification and reduce the effect size of epidemiological associations. Our high-fidelity call set refines previous population-based associations of CHIP with incident outcomes. For example, the annualized incidence of myeloid malignancy in individuals with small CHIP clones is 0.03% per year, which increases to 0.5% per year among individuals with very large CHIP clones. We also find a significantly lower prevalence of CHIP in individuals of self-reported Latino or Hispanic ethnicity in All of Us, highlighting the importance of including diverse populations. The standardization of CHIP calling will increase the fidelity of CHIP epidemiological work and is required for clinical CHIP diagnostic assays.

PubMed Disclaimer

Conflict of interest statement

Conflict-of-interest disclosure: M.S. has membership on a board or advisory committee of AbbVie, Bristol Myers Squibb, CTI, Forma, Geron, Karyopharm, Novartis, Ryvu, Sierra Oncology, Taiho, Takeda, and TG Therapeutics; has patents and royalties from Boehringer Ingelheim; received research funding from ALX Oncology, Astex, Incyte, Takeda, and TG Therapeutics; owns equity in Karyopharm and Ryvu; and is a consultant in: Forma, Karyopharm, and Ryvu. B.L.E. has received research funding from Celgene, Deerfield, and Novartis and consulting fees from GRAIL; and serves on the scientific advisory boards for Skyhawk Therapeutics, Exo Therapeutics, and Neomorph Therapeutics, all unrelated to this work. S.J. is a paid consultant to Novartis, AVRO Bio, Roche Genentech, GSK, and Foresite Labs and is on the scientific advisory board to Bitterroot Bio. B.L.E., S.J., A.B., and P.N. are cofounders, equity holders, and on the scientific advisory board of TenSixteen Bio. The remaining authors declare no competing financial interests.

Figures

None
Graphical abstract
Figure 1.
Figure 1.
Schematic of CHIP variant ascertainment workflow. Putative somatic mutations are first identified using a somatic mutation caller and annotated for gene- and protein-level changes. Variants are then filtered based on an initial, liberal set of parameters and filtered based on gene-specific CHIP variant rules. In some genes, all loss-of-function mutations are considered putative CHIP variants, whereas in other genes, only specific missense mutations are included. Leveraging available large-scale sequencing data, we apply 3 filters to identify artifactual genes and variants. We then optimize the sequencing-based filtering parameters, yielding a final CHIP mutation call set.
Figure 2.
Figure 2.
Verifying the association of common putative ASXL1 variants with age can help distinguish true variants from recurrent artifacts. (A) Association of all ASXL1 variants present ≥20 times in the UKB exome data set and ≥15 times in the All of Us whole genome data set. Variants not associated with age- or a CHIP-associated TERT promoter variant (rs7705526) are colored in red. Variants associated with rs7705526 only are colored in blue. (B-C) Association of ASXL1 G646Wfs∗12 and G645Vfs∗58 with age across VAF strata identifies specific large VAF subsets of G646Wfs∗12 as somatic mutations, whereas G645Vfs∗58 appears to be an artifact of exome sequencing that is not present in All of Us. (D) There is a significant association of ASXL1 variants passing filtering with myeloid cancer, death, and smoking, but a minimal association with variants that were removed, supporting that these removed variants are artifacts.
Figure 3.
Figure 3.
Association of CHIP variants defined by minimum allele depth (minAD) strata with age and TERT promoter variant rs7705526. (A-D) Show the associations for strata of CHIP variants defined by minAD 3, 4, 5, and 6 with age (panels A, C) and the rs7705526 TERT promoter variant (panels B, D) in UKB and All of Us Cohorts. The right axis plots the results of a simulation experiment in which the specified proportion of samples in the minAD ≥5 CHIP data set was randomly exchanged for individuals without CHIP in the data set to estimate misclassification. Each simulation was run 20 times, and the average result is shown.
Figure 4.
Figure 4.
Associated risk of death and incident myeloid cancers with CHIP when defined using minimum sequencing allele depth (minAD) thresholds of 3 and 5 in the UKB, assessed using Cox proportional hazards regressions adjusted for age, age-squared, sex, smoking history, and 10 principal components of genetic ancestry. The risk is greater for minAD ≥5, including when hotspot variants (defined as variants observed ≥20 times in the data set) and nonhotspot variants (present <20 times) are assessed separately. The risk increases proportional to the VAF, with nearly a 10-fold increase in risk between VAF <5% and >20% strata.
Figure 5.
Figure 5.
Profile of CHIP variants in the UKB and All of Us. The distribution of genes affected by CHIP in (A) the UKB and (B) All of Us are broadly similar. (C) CHIP prevalence increases with age in All of Us. (D) CHIP is associated with decreased prevalence in individuals of self-reported Hispanic or Latino ethnicity and is positively associated with age and smoking history.

Comment in

References

    1. Mustjoki S, Young NS. Somatic mutations in “Benign” disease. N Engl J Med. 2021;384(21):2039–2052. - PubMed
    1. Arber DA, Orazi A, Hasserjian RP, et al. International consensus classification of myeloid neoplasms and acute leukemias: integrating morphologic, clinical, and genomic data. Blood. 2022;140(11):1200–1228. - PMC - PubMed
    1. Khoury JD, Solary E, Abla O, et al. The 5th edition of the World Health Organization classification of haematolymphoid tumours: myeloid and histiocytic/dendritic neoplasms. Leukemia. 2022;36(7):1703–1719. - PMC - PubMed
    1. Genovese G, Kähler AK, Handsaker RE, et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N Engl J Med. 2014;371(26):2477–2487. - PMC - PubMed
    1. Steensma DP, Bejar R, Jaiswal S, et al. Clonal hematopoiesis of indeterminate potential and its distinction from myelodysplastic syndromes. Blood. 2015;126(1):9–16. - PMC - PubMed

Publication types