Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Aug;584(7819):136-141.
doi: 10.1038/s41586-020-2430-6. Epub 2020 Jun 24.

Monogenic and polygenic inheritance become instruments for clonal selection

Affiliations

Monogenic and polygenic inheritance become instruments for clonal selection

Po-Ru Loh et al. Nature. 2020 Aug.

Abstract

Clonally expanded blood cells that contain somatic mutations (clonal haematopoiesis) are commonly acquired with age and increase the risk of blood cancer1-9. The blood clones identified so far contain diverse large-scale mosaic chromosomal alterations (deletions, duplications and copy-neutral loss of heterozygosity (CN-LOH)) on all chromosomes1,2,5,6,9, but the sources of selective advantage that drive the expansion of most clones remain unknown. Here, to identify genes, mutations and biological processes that give selective advantage to mutant clones, we analysed genotyping data from the blood-derived DNA of 482,789 participants from the UK Biobank10. We identified 19,632 autosomal mosaic chromosomal alterations and analysed these for relationships to inherited genetic variation. We found 52 inherited, rare, large-effect coding or splice variants in 7 genes that were associated with greatly increased vulnerability to clonal haematopoiesis with specific acquired CN-LOH mutations. Acquired mutations systematically replaced the inherited risk alleles (at MPL) or duplicated them to the homologous chromosome (at FH, NBN, MRE11, ATM, SH2B3 and TM2D3). Three of the genes (MRE11, NBN and ATM) encode components of the MRN-ATM pathway, which limits cell division after DNA damage and telomere attrition11-13; another two (MPL and SH2B3) encode proteins that regulate the self-renewal of stem cells14-16. In addition, we found that CN-LOH mutations across the genome tended to cause chromosomal segments with alleles that promote the expansion of haematopoietic cells to replace their homologous (allelic) counterparts, increasing polygenic drive for blood-cell proliferation traits. Readily acquired mutations that replace chromosomal segments with their homologous counterparts seem to interact with pervasive inherited variation to create a challenge for lifelong cytopoiesis.

PubMed Disclaimer

Conflict of interest statement

The authors declare competing interests: patent application PCT/WO2019/ 079493 has been filed on the mCA detection method used in this work.

Figures

Extended Data Figure 1:
Extended Data Figure 1:. Mosaic chromosomal alterations detected among 482,789 UK Biobank participants.
(a) Each horizontal line corresponds to an mCA; a total of 19,632 autosomal events in 17,111 unique individuals are displayed. Detected events are color-coded by copy number of the affected chromosome or segment (orange, LOH; blue, loss/deletion; red, gain/duplication). Focal deletions are labeled in blue with the names of putative target genes. Loci containing inherited variants influencing somatic events in cis are labeled in the same color as the corresponding mCA (orange for CN-LOH-associated loci, blue for losses). (b) Sex and age distributions of individuals with detected mosaic events. Marker size and color intensity increase with event frequency. Error bars, 95% CIs. Sample sizes are provided in Supplementary Table 1 and numeric data are provided in Supplementary Table 4. We previously reported three events with unusual sex biases (gains on chromosome 15, 16p11.2 deletions, and 10q terminal deletions)9, all of which replicated here. We have not identified a mechanism that could explain the sex biases. The overall tendency of male enrichment for most mCAs raises the possibility that environmental exposures could result in genomic insults that lead to mCAs; however, the heterogeneity of the level of male enrichment across different mCAs suggests that the mechanisms producing sex biases may be event-specific. (c) Enrichment of mosaic chromosomal alterations in individuals with anomalously high blood indices. Different mCAs are significantly enriched (FDR 0.05; one-sided Fisher’s exact test) among N=455,009 individuals with anomalous blood counts in different blood lineages (adjusted for age, sex, and smoking status). Events were grouped by chromosome and copy number, with loss and CN-LOH events subdivided by p-arm vs. q-arm. (We did not subdivide gain events by arm because most gain events are whole-chromosome trisomies.) Numeric data are provided in Supplementary Table 5.
Extended Data Figure 2:
Extended Data Figure 2:. Copy number determination and QC of mosaic chromosomal alteration calls.
(a–d) Total vs. relative allelic intensities of mCAs detected on each chromosome. Mean log2 R ratio (LRR) of each detected mCA is plotted against estimated change in B allele frequency at heterozygous sites (|ΔBAF|). The data exhibit the characteristic “arrowhead” pattern in which LRR/|ΔBAF| approximately equals a positive constant for gain events, zero for CN-LOH events, and a negative constant for loss events. Possible constitutional duplications were filtered according to thresholds on LRR and |ΔBAF| defined in Supplementary Note 1. Constitutional duplications have expected |ΔBAF|=1/6 and have LRR≈0.36 in this data set. We chose exclusion thresholds to conservatively discard all calls that might belong to this cluster, applying more stringent filtering to shorter events because (i) most constitutional duplications are short and (ii) shorter events have noisier LRR and |ΔBAF| estimates. (e) Estimation of false discovery rate using age distributions of individuals with mCA calls. We generated age distributions for (i) “high-confidence” detected events passing a permutation-based FDR threshold of 0.01 (bright green), (ii) “medium-confidence” events below the FDR threshold of 0.01 but passing an FDR threshold of 0.05 (darker green), and (iii) “low-confidence” events below the FDR threshold of 0.05 but passing an FDR threshold of 0.10 (darkest green; excluded from our call set but plotted for context). We compared these distributions to the overall age distribution of UK Biobank participants (grey). Based on the numbers of events in each category, ≈32% of medium-confidence detected events are expected to be false positives. To estimate our true FDR, we regressed the medium-confidence age distribution on the high-confidence and overall age distributions, reasoning that the medium-confidence age distribution should be a mixture of correctly-called events (with age distribution similar to that of the high-confidence events) and spurious calls (with age distribution similar to the overall cohort). We observed a regression weight of 0.44 for the component corresponding to spurious calls, in good agreement with expectation, and implying a true FDR of 6.6% (4.5–8.6%, 95% CI based on regression fit on n=6 age bins). (f) Fractions of individuals with at least one detected autosomal mCA stratified by age and sex. Error bars, 95% CI. Numeric data are provided in Supplementary Table 3.
Extended Data Figure 3:
Extended Data Figure 3:. Principal component plot of UK Biobank participants.
Individuals are plotted by their first two genetic principal component coordinates as computed by UK Biobank and colored according to self-reported ethnic background. Red circles indicate individuals identified in our exome analyses (of self-reported White individuals with mosaic CN-LOH events) as carriers of rare coding or splice variants in frequently-targeted genes. Marginal density histograms stratified by self-reported ethnic background are provided next to the PC1 and PC2 axes.
Extended Data Figure 4:
Extended Data Figure 4:. Quantile-quantile plots of P-values produced by association analyses.
These plots verify the calibration of the statistical tests we used to identify the genome-wide significant associations reported in Extended Data Table 1 (see legend for details of statistical tests and sample sizes). In each plot, the blue dots correspond to an analysis of all variants tested, while the black dots correspond to an analysis in which regions surrounding significant associations were excluded. Specifically, the plots respectively exclude 1:35–55Mb (MPL), 1:239–244Mb (FH), 8:88–93Mb (NBN), 9:2.5–7.5Mb (JAK2), 11:92–97Mb (MRE11), 11:103–113Mb (ATM), 12:109–114Mb (SH2B3), 14:92.5–102.5Mb (TCL1A and DLK1), and 15:100Mb–qter (TM2D3). In all cases, exclusion of the hit regions (which account for a small fraction of the variants tested) resulted in a distribution close to the expected null.
Extended Data Figure 5:
Extended Data Figure 5:. Identification and validation of an inherited MPL structural variant.
We suspected that an association between rs144279563 and acquired 1p CN-LOH mutations might tag a causal structural variant in MPL. (While rs144279563 is ∼1.5Mb downstream of MPL, it is sufficiently rare to be in linkage disequilibrium with variants several megabases away.) We therefore examined genotyping intensities at MPL from 49,950 individuals typed on the BiLEVE chip (which contains more probes within MPL than the Biobank chip, on which the remaining individuals were typed.) (a) Mean genotyping intensities over 42 carriers of the rs144279563 rare allele exhibit a sharp increase at the end of MPL exon 9 (1 genotyping probe) followed by a sharp decrease in exon 10 (3 genotyping probes). (b,c) Closer inspection of genotyping intensities at the 4 probes across all BiLEVE individuals enabled identification of 27 individuals likely to carry an inherited structural variant (20 of which carry the rs144279563 rare allele). We called this variant in the BiLEVE cohort using two criteria: (i) correct sign of LRR at the 4 probes (+,–,–,–); and (ii) mean signed LRR shift >0.4 over the 4 probes. (d) Read support for a 454bp deletion spanning MPL exon 10 in exome-sequenced individuals. We used IGV44 to plot paired-end reads aligning in or near MPL exons 9 and 10 in four exome-sequenced individuals imputed to carry the MPL structural variant (and also mosaic for 1p CN-LOH events). Read pairs highlighted in red have unusually long insert sizes, consistent with a deletion of genomic sequence between the aligned reads. Multicolored read segments indicate clipped reads in which one end of a read stops aligning to the reference genome. On the left side of the deletion, clipped reads align through hg19 base pair 43,814,728 (…AGGGACTGGG), with mismatches consistently occurring starting from 43,814,729 rightward (hg19: CGCCG…). On the right side of the deletion, clipped reads align starting from 43,815,178 (CTGGGACTCG…), with mismatches starting from 43,815,177 leftward (hg19: …CACCT). Examination of individual clipped reads revealed sequence matching …AGGGACTGGGACTCG…, indicating deletion of 5bp (CTGGG) in addition to the 449bp between aligning read segments. (Note that in this caption we have used hg19 coordinates for consistency with the rest of this manuscript; the IGV plot uses hg38 coordinates because reads had been aligned to hg38 (amounting to an offset of –465,671bp relative to hg19 at MPL.) (e,f) Decreased read depth at exon 10 in all 32 imputed carriers of the MPL exon 10 deletion who had been exome-sequenced. We used mosdepth45 to compute mean read depth across all 12 MPL exons in the 32 exome-sequenced imputed deletion carriers along with 32 controls. We normalized read depth in each individual by dividing by mean read depth across exons 1–8 and 11–12. All 32 imputed carriers of the exon 10 deletion had lower exon 10 normalized read depths than all 32 controls. We did not observe any evidence of increased read depth in exon 9 in carriers vs. controls.
Extended Data Figure 6:
Extended Data Figure 6:. Identity-by-descent (IBD) graph at MPL among individuals with likely 1p CN-LOH events spanning MPL.
We called IBD tracts using GERMLINE with haplotype extension. Colored nodes indicate carriers of the 28 rare coding or splice variants we observed to be independently (and probably causally) associated with 1p CN-LOH mutations (always replacing the rare allele with the reference allele; Extended Data Table 1 and Supplementary Table 7). (Note that the numbers of carriers listed for each variant here are slightly higher than in the “Allelic shift” columns of Extended Data Table 1 and Supplementary Table 7 because allelic shifts could only be confidently ascertained for a subset of carriers.) The presence of additional IBD clusters not carrying any of the 28 highlighted variants suggests that even more causal variants in MPL remain to be discovered.
Extended Data Figure 7:
Extended Data Figure 7:. Identity-by-descent (IBD) graph at ATM among individuals with likely 11q CN-LOH events spanning ATM.
We called IBD tracts using GERMLINE with haplotype extension. Colored nodes indicate carriers of the eight rare coding or splice variants we observed to be independently (and probably causally) associated with 11q CN-LOH mutations (always making the rare allele homozygous; Extended Data Table 1 and Supplementary Table 7). The presence of additional IBD clusters not carrying any of the highlighted variants suggests that even more causal variants in ATM remain to be discovered. The two carriers of rs786204751 are also carriers of rs587779872, as discussed in Methods.
Extended Data Figure 8:
Extended Data Figure 8:. Variant allele fractions of rare coding or splice variants likely to be targets of CN-LOH mutations in exome-sequenced individuals.
Variant allele fractions (VAF = number of reads matching the alternate allele divided by the total number of reads matching either the reference or the alternate allele) are plotted for each variant call identified as the potential target of a CN-LOH event (either from association analyses or burden analyses). Error bars, 95% CIs approximated using binomial standard errors multiplied by 1.96. Allelic read depths for variants identified at DNMT3A, TET2, and JAK2 are broadly indicative of somatic origin (VAF<0.5), while read depths for variants at the seven inherited risk loci are broadly consistent with inherited variation (VAF≈0.5). Read depths were generally insufficient to make a confident assessment of somatic vs. inherited origin on a per-variant level, as evidenced by wide VAF error bars; additionally, making this determination is further complicated by mapping bias toward the reference allele, which can produce VAF lower than 0.5 even for inherited variants.
Extended Data Figure 9:
Extended Data Figure 9:. Tendencies of CN-LOH mutations to modify polygenic scores for 29 blood cell parameters.
For each blood count parameter and each chromosome arm, the heatmap reports the z-score for the mean change in polygenic score across all CN-LOH mutations detected on the arm. Among the 29 blood count parameters we considered, some of the parameters corresponding to abundances of blood cell types might be surrogates for enhanced cellular fitness (in many cases of mitotic progenitors rather than the cell types themselves). Other parameters reflect cell size or morphology. Effects of CN-LOH mutations on polygenic scores for these parameters may reflect the production of abnormal cells by biologically altered stem cells, rather than cellular fitness itself (which may be a property of the unobserved hematopoietic stem cells). Columns: platelet count and crit (PLT#, Pct); red blood cell count (RBC#), hemoglobin (Hgb), and hematocrit (Hct) (both strongly correlated with RBC#); reticulocyte count and percent (RET#, RET%); high light scatter reticulocyte count and percent (HLR#, HLR%); immature reticulocyte fraction (IRF); white blood cell count (WBC#); neutrophil count and percent (NEU#, NEU%); eosinophil count and percent (EOS#, EOS%); monocyte count and percent (MON#, MON%); basophil count and percent (BAS#, BAS%); lymphocyte count and percent (LYM#, LYM%); platelet distribution width (PDW), mean platelet volume (MPV), RBC distribution width (RDW), mean corpuscular volume (MCV), mean reticulocyte volume (MRV), mean sphered cell volume (MSCV), mean corpuscular hemoglobin (MCH), and mean corpuscular hemoglobin concentration (MCHC).
Figure 1:
Figure 1:. Fine-mapped inherited sequence alleles associated with the acquisition/selection of CN-LOH mutations in cis.
(a) MPL, (b) FH, (c) NBN, (d) MRE11, (e) ATM, (f) SH2B3, (g) TM2D3. At each locus, the CN-LOH mutations acquired by expanded clones tend to have deleted (a) or duplicated (bg) the inherited alleles in a predictable manner as shown. Each panel is organized in the following way: top, genomic modifications observed in clones; bottom, association P-values (two-sided Fisher’s exact test on n≥378,307 individuals; Methods) vs. chromosomal position. All variants with filled symbols are likely causal coding or splice variants (Extended Data Table 1); black marker edges indicate evidence of pathogenicity in ClinVar. Distinct colors are used to indicate the statistical independence of variants; any variants in linkage disequilibrium with likely causal variants (R2>0.2 in cases) are indicated with open symbols with a border color matching that of the likely causal variant. Symbol shapes indicate the effects of the indicated variant on encoded protein (LoF, missense, etc.); symbol sizes scale inversely with minor allele frequency.
Figure 2:
Figure 2:. Polygenic and monogenic influences on clonal proliferation of cells with CN-LOH mutations.
(a) Two cellular outcomes of a CN-LOH mutation (mitotic recombination) involving homologous chromosome arms that bear inherited alleles with differing proliferative potentials. In one cell, the CN-LOH mutation has duplicated the chromosomal arm that has alleles that more strongly promote proliferation; proliferative polygenic drive increases, potentially resulting in clonal selection of the mutant cell. By contrast, the cell with the complementary CN-LOH mutation may have reduced tendency to proliferate. (b) CN-LOH mutations in expanded clones broadly increase polygenic risk scores for increased blood-cell counts and risk of mosaic Y chromosome loss (a marker for clonal hematopoiesis27). The heatmap displays changes in polygenic scores for each trait, averaged across all ascertained (expanded) CN-LOH mutations observed on each chromosome arm (color bar, z-score; ∗, significant at FDR<0.05; ∗∗, Bonferroni-corrected P<0.05). (c) Prediction of the direction of CN-LOH mutations (in expanded clones) from inherited alleles on the affected chromosome arms. Prediction accuracy (the correlation between predicted and observed CN-LOH direction) is plotted for predictions made using: only CN-LOH-associated alleles (Extended Data Table 1 and Supplementary Table 7) (red); polygenic score differentials on affected chromosomal segments (orange); or both sources of information (green). Error bars, 95% CIs. Results are plotted for 14 chromosome arms for which at least one predictor was available. Numeric data and sample sizes are provided in Supplementary Tables 15 and 18. Analyses of polygenic scores for control traits such as height and BMI are provided in Supplementary Table 16.
Figure 3:
Figure 3:. Associations of mCAs with incident cancers and cardiovascular disease.
(a) Clones with specific mCAs confer increased risk of incident blood cancers diagnosed >1 year after DNA collection in individuals with normal blood counts at assessment (Cochran-Mantel-Haenszel test adjusting for age and sex; error bars, 95% CIs). Seven of nine associations we previously reported9 (all but 16p= and 20q–) replicate here; “=” is shorthand for CN-LOH. (b) Loss, CN-LOH, and gain events (on any autosome) do not broadly increase risk for incident myocardial infarction or stroke, but CN-LOH events on 9p (containing JAK2) do increase cardiovascular risk28 (two-sided Fisher’s exact test on cases and controls matched for assessment year, age, sex, smoking, hypertension, BMI, and type 2 diabetes status; error bars, 95% CIs). Statistical tests are detailed in Methods. Numeric data and sample sizes are provided in Supplementary Tables 20 and 22.

References

    1. Jacobs KB et al. Detectable clonal mosaicism and its relationship to aging and cancer. Nature Genetics 44, 651–658 (2012). - PMC - PubMed
    1. Laurie CC et al. Detectable clonal mosaicism from birth to old age and its relationship to cancer. Nature Genetics 44, 642–650 (2012). - PMC - PubMed
    1. Genovese G et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. New England Journal of Medicine 371, 2477–2487 (2014). - PMC - PubMed
    1. Jaiswal S et al. Age-related clonal hematopoiesis associated with adverse outcomes. New England Journal of Medicine 371, 2488–2498 (2014). - PMC - PubMed
    1. Machiela MJ et al. Characterization of large structural genetic mosaicism in human autosomes. American Journal of Human Genetics 96, 487–497 (2015). - PMC - PubMed

Publication types

MeSH terms