Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 14;15(1):9864.
doi: 10.1038/s41467-024-54223-z.

Base-excision repair pathway shapes 5-methylcytosine deamination signatures in pan-cancer genomes

Affiliations

Base-excision repair pathway shapes 5-methylcytosine deamination signatures in pan-cancer genomes

André Bortolini Silveira et al. Nat Commun. .

Erratum in

Abstract

Transition of cytosine to thymine in CpG dinucleotides is the most frequent type of mutation in cancer. This increased mutability is commonly attributed to the spontaneous deamination of 5-methylcytosine (5mC), which is normally repaired by the base-excision repair (BER) pathway. However, the contribution of 5mC deamination in the increasing diversity of cancer mutational signatures remains poorly explored. We integrate mutational signatures analysis in a large series of tumor whole genomes with lineage-specific epigenomic data to draw a detailed view of 5mC deamination in cancer. We uncover tumor type-specific patterns of 5mC deamination signatures in CpG and non-CpG contexts. We demonstrate that the BER glycosylase MBD4 preferentially binds to active chromatin and early replicating DNA, which correlates with lower mutational burden in these domains. We validate our findings by modeling BER deficiencies in isogenic cell models. Here, we establish MBD4 as the main actor responsible for 5mC deamination repair in humans.

PubMed Disclaimer

Conflict of interest statement

Competing interests D. Rieke reports advisory agreement with BeiGene and Bayer, honoraria from Bristol Myers Squibb, Bayer and Roche, research support from Seagen, and personal fees from Bayer and Johnson & Johnson, all outside the submitted work. A. Picca reports personal fees from AstraZeneca and Servier, all outside the submitted work. F. Bielle reports funding of research from Abbvie, service agreement for research contracted between his institution and Treefrog Therapeutics as well as Owkin, personal fees from Bristol Myers Squibb and a next-of-kin employed by Bristol Myers Squibb, all outside the submitted work. M.L. Yaspo is COO/CSO and shareholder of Alacris Theranostics without conflict of interest with the submitted work. M. Rodrigues reports non-financial support from AstraZeneca and Merck Sharp and Dohme, grants from Daiichi Sankyo, personal fees from AstraZeneca, Immunocore, Merck Sharp and Dohme and GlaxoSmithKline, all outside the submitted work. M.-H. Stern reports grants from Immunocore and Bionano, and royalties from Myriad Genetics, all outside the submitted work. The remaining authors have no conflict of interest to declare.

Figures

Fig. 1
Fig. 1. Refining the spectrum of CpG mutational signatures and their dependence on 5mC deamination.
a Substitution profiles by trinucleotide sequence context (96-channel) of SBS reference mutational signatures characterized by a high frequency of CpG>NpG substitutions. The most frequent substitutions per signature are indicated. b Scatter plot of exposures to predominant CpG>NpG mutational signature found per tumor sample, as absolute exposure versus percent exposure contribution. The dashed line indicates the 30% contribution cutoff used to select SBS1 samples. AML, acute myeloid leukemia; BRCA, breast invasive carcinoma; SARC, sarcoma; UVM, uveal melanoma; GBM, glioblastoma multiforme; HHG, high-grade glioma; LYMP, lymphoid neoplasm; DLBC, diffuse large B-cell lymphoma; EAC, esophageal adenocarcinoma; STAD, stomach adenocarcinoma; BLCA, bladder urothelial carcinoma. c Distributions of cosine similarity increase for signature fitting with rare SBS96 compared to common signatures only, per tumor sample. MBD4def, MBD4-deficient (n = 20); MBD4wt, MBD4 wild-type (n = 9). The dashed line indicates the standard cutoff of FitMS in cosine similarity increase multistep mode. Boxes indicate the median, 25th and 75th percentiles. Whiskers extend to the largest or lowest value up to 1.5 times the distance between the 25th and 75th percentiles. d Scatter plot of DNA methylation percentages in CpG > TpG mutated sites versus all CpGs (global), per signature and cell lineage. Methylation was interrogated in data from normal human cell types. The dashed line indicates the absence of over- or under-representation of methylation in mutated CpGs. e Scatter plots of CpG > TpG mutation rates per CpG of different tumor types and signatures in 2 kb genomic windows grouped by their mean CpG methylation levels. Mutation rates were normalized by the highest value in each tumor type. The lines indicate data fitting with linear regression models or smoothed conditional means models. Two-sided Pearson correlation statistics are shown. Shaded areas represent the 95% confidence intervals. f Bar plots of CpG > TpG mutation rates per CpG in genic or intergenic regions (upper panel). Transcriptional strand asymmetry of CpG > TpG mutations in genic regions (lower panel). Genes were grouped based on expression level quartiles. Asterisks mark a significant difference in contribution between transcribed and untranscribed strands (see “Methods”). The dashed line indicates the cutoff used to assign significance. Source data are provided as a Source Data file.
Fig. 2
Fig. 2. Features of SBS mutational signatures associated with distinct POLD1 mutations.
a Replication strand asymmetry of mutational signatures associated with POLD1 mutations. The asterisks mark a significant difference in contribution between leading and lagging strands (see “Methods”). POLD1exo*, POLD1 exonuclease domain mutated; POLD1mut, POLD1 mutated; MMRd, mismatch repair deficiency. b SBS105 CpG > NpG relative mutation rates in ENCODE chromatin states of normal cell types matched to each tumor type. BRCA, breast invasive carcinoma; BLCA, bladder urothelial carcinoma; EnhA/EnhG, active/genic enhancer; Tx/TxWk, strong/weak transcription; TssA, active TSS; TssFlnk, flanking TSS; TssBiv, bivalent/poised TSS; EnhBiv, bivalent enhancer; ReprPC/ReprPCWk, strong/weak repressed polycomb; Quies, quiescent/low; Het-Rpts, heterochromatin/ZNF genes and repeats. c Pol δ-PCNA holoenzyme structure. Mutated amino acids in the polymerase domain of Pol δ catalytic subunit (POLD1) are indicated in black. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. SBS96 recapitulates lineage-specific CpG and non-CpG methylation landscapes.
a Heatmaps of SBS96 relative CpG > TpG mutation rates in CpG blocks differentially methylated between normal cell type pairs. Hypermethylated blocks are indicated in the x-axis, and tumor types in the y-axis. Values are normalized per tumor type. b C > T substitution profiles by trinucleotide context of tumor type-specific SBS96. Bars represent means in UVM (n = 1) or other tumor types (n = 4; AML, BRCA, SARC, and HGG). c Scatter plots of CpA>TpA substitution percentages versus all C > T substitutions percentages or SBS96 percentage contribution in MBD4-deficient (MBD4def) tumors. Lines indicate data fitting with linear regression models. Two-sided Pearson correlation statistics are shown. d Distributions of the absolute number of CpA > TpA substitutions in UVM tumors MBD4def (n = 4) or MBD4 wild-type (MBD4wt; n = 12). Two-sided Wilcoxon test P-value is indicated. Boxes indicate the median, 25th and 75th percentiles. Whiskers extend to the largest or lowest value up to 1.5 times the distance between the 25th and 75th percentiles. e Sequence probability logos around CpA > TpA mutated sites in MBD4def UVM and of an equal number of randomly interrogated CpA sites (n = 2823). f Sequence probability logos around top non-CpG methylated sites in Dnmt triple-knockout mouse embryonic stem cells with ectopic reintroduction of Dnmt3a (n = 1000) or Dnmt3b (n = 189). g Sequence probability logos around methylated CpA sites in uveal melanocytes (UvMel) and uveal melanomas (UVM). Logos in panels e-g were generated with kpLogo and Bonferroni corrected P-values are shown. h CpA methylation percentages in CAC context in normal cell types. i Dot plot of single-nuclei RNAseq data of the posterior human eye. Dot size indicates the percentage of nuclei expressing each gene. j Distributions of gene expression in TCGA tumors, including AML (n = 151), BRCA (n = 1231), SARC (n = 265), HGG (n = 175) and UVM (n = 80). Values are expressed as transcripts per million (TPM). Two-sided Wilcoxon test P-values without multiple comparisons adjustment are indicated. Statistics of boxes and whiskers are described above. AML, acute myeloid leukemia; BRCA, breast invasive carcinoma; SARC, sarcoma; HHG, high-grade glioma; MyelProg, common myeloid progenitor; BreastEpi, breast luminal epithelium; MesenSC, mesenchymal stem cell; Oligod, oligodendrocyte; OPC, oligodendrocyte precursor cell. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. SBS96 targets tumor-specific driver genes.
a Oncoplot of oncogenic mutations in uveal melanoma (UVM) cases by MBD4 status. MBD4def, MBD4-deficient; MBD4wt, MBD4 wild-type. Tumor samples from the same individual showing identical mutational patterns were combined. b Oncogenic mutations in GNA11 and BAP1 found in UVM tumors. Amino acid positions are derived from mutation positions in the transcript. The values in circles indicate the number of cases harboring each mutation. Protein domains are shown in blue. c Distribution of the number of methylated CpGs (mCpGs) in the coding sequence (CDS) per gene, grouped by the total number of nonsynonymous CpG>TpG mutations observed among MBD4def UVM. Key UVM drivers are shown as red dots and other genes are shown as boxplots. The number of genes per boxplot is shown. Boxes indicate the median, 25th and 75th percentiles. Whiskers extend to the largest or lowest value up to 1.5 times the distance between the 25th and 75th percentiles. d Scatter plot of gene expression values expressed as transcripts per million (TPM) versus the number of mCpGs in the CDS, per gene. Key UVM drivers are shown as red dots. e Distribution of CpG methylation percentages among normal cell types for CpG > TpG oncogenic mutations, separated by tumor type. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. MBD4 preferentially protects active chromatin and early replicating DNA.
a Western blotting on nuclear extracts of parental HAP1 cells or clones overexpressing C- or N-terminally FLAG-tagged MBD4. Exogenous and endogenous MBD4 bands and positions of molecular weight markers are indicated. No replication attempt was performed. b Histone mark enrichment over IgG in 2 kb genomic windows, ranked from lowest to highest enrichment (upper panel). FLAG enrichment over IgG in the corresponding genomic windows (lower panel). Means of every 400 or 8000 similarly-ranked windows are shown in colored shades or black, respectively. c Heatmap of Pearson correlation coefficients between histone marks and FLAG enrichment, obtained from the means of every 400 similarly-ranked windows. d Scatter plots of CpG > TpG mutation rates per CpG versus mean CpG methylation levels in chromatin states. Tumor mutations and normal epigenomic data are grouped by lineage. Mutation rates were normalized by the highest value per tumor, and the means of all tumors per lineage are shown. Lines indicate data fitting with linear regression models. Two-sided Pearson correlation statistics are shown. TssA, active TSS; TssFlnk/TssFlnkD/TssFlnkU, flanking TSS; EnhA/EnhG, active/genic enhancer; Tx/TxWk, strong/weak transcription; TssBiv, bivalent/poised TSS; EnhBiv, bivalent enhancer; ReprPC/ReprPCWk, strong/weak repressed polycomb; Quies, quiescent/low; Het-Rpts, heterochromatin/ZNF genes and repeats. e Distributions of CpG > TpG mutation rates per methylated CpG (mCpG) in SBS1 (n = 359) and SBS96 (n = 20) tumors in active or repressed/bivalent chromatin states. Observed relative to expected mutation rates are shown, considering an expected random distribution of CpG > TpG mutations among mCpGs. Two-sided Wilcoxon test P-values are indicated. Boxes indicate the median, 25th and 75th percentiles. Whiskers extend to the largest or lowest value up to 1.5 times the distance between the 25th and 75th percentiles. f Signal enrichment of FLAG-tagged MBD4 in replication timing annotations, relative to FLAG enrichment in parental HAP1 cells. Early, constitutive early; Dyn, dynamic; Late, constitutive late. g Distributions of CpG > TpG mutation rates per mCpG in SBS1 (n = 442) and SBS96 (n = 20) tumors in replication timing annotations. Observed relative to expected mutation rates are shown, considering an expected random distribution of CpG > TpG mutations among mCpGs. Statistics of boxes and whiskers are described above. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. 5mC deamination repair is primarily dependent on MBD4 in human cells.
a Western blotting on nuclear extracts of HAP1 cells wild-type or knock-out for MBD4, TDG, or both (dKO). Arrows indicate TDG-specific bands. Positions of molecular weight markers are indicated. No replication attempt was performed. b Workflow used to quantify CpG >TpG mutation rates in isogenic cell line models. Expanded clones of diploidized HAP1 cells of each genotype were analyzed by WGS and then maintained in vitro for 4 months before a second subcloning step. WGS on expanded subclones were then compared against the respective clone for variant calling. c Distributions of C > T substitution frequencies obtained by WGS in HAP1 subclones (n = 4 per genotype) after 120 days in culture, relative to the mean of wild-type subclones (shown as a dashed line). Two-sided unpaired equal variance t-test P-values without multiple comparisons adjustment are indicated. Boxes indicate the median, 25th and 75th percentiles. Whiskers extend to the largest or lowest value up to 1.5 times the distance between the 25th and 75th percentiles. d C > T substitution frequencies by trinucleotide context in HAP1 subclones relative to the mean of wild-type subclones. Bars represent means (n = 4 per genotype). e Distributions of CpG >TpG mutation rates per methylated CpG (mCpG) in SBS1 (n = 442) and SBS96 (n = 20) tumors or HAP1 subclones (n = 4 per genotype) in replication timing genomic annotations. Early, constitutive early; Dyn, dynamic; Late, constitutive late. For SBS1 and SBS96 data, two-sided Wilcoxon test P-values without multiple comparisons adjustment are indicated. For HAP1 subclones data, two-sided unpaired equal variance t-test P-values without multiple comparisons adjustment are indicated. Statistics of boxes and whiskers are described above. Source data are provided as a Source Data file.

References

    1. Nik-Zainal, S. et al. Mutational processes molding the genomes of 21 breast cancers. Cell149, 979–993 (2012). - PMC - PubMed
    1. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature500, 415–421 (2013). - PMC - PubMed
    1. Alexandrov, L. B. & Stratton, M. R. Mutational signatures: the patterns of somatic mutations hidden in cancer genomes. Curr. Opin. Genet Dev.24, 52–60 (2014). - PMC - PubMed
    1. Zou, X. et al. A systematic CRISPR screen defines mutational mechanisms underpinning signatures caused by replication errors and endogenous DNA damage. Nat. Cancer2, 643–657 (2021). - PMC - PubMed
    1. Degasperi, A. et al. Substitution mutational signatures in whole-genome-sequenced cancers in the UK population. Science376, abl9283 (2022). - PMC - PubMed

Publication types

Associated data