Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Sep;28(9):1264-1271.
doi: 10.1101/gr.231688.117. Epub 2018 Aug 13.

Noncanonical secondary structures arising from non-B DNA motifs are determinants of mutagenesis

Affiliations

Noncanonical secondary structures arising from non-B DNA motifs are determinants of mutagenesis

Ilias Georgakopoulos-Soares et al. Genome Res. 2018 Sep.

Abstract

Somatic mutations show variation in density across cancer genomes. Previous studies have shown that chromatin organization and replication time domains are correlated with, and thus predictive of, this variation. Here, we analyze 1809 whole-genome sequences from 10 cancer types to show that a subset of repetitive DNA sequences, called non-B motifs that predict noncanonical secondary structure formation can independently account for variation in mutation density. Combined with epigenetic factors and replication timing, the variance explained can be improved to 43%-76%. Approximately twofold mutation enrichment is observed directly within non-B motifs, is focused on exposed structural components, and is dependent on physical properties that are optimal for secondary structure formation. Therefore, there is mounting evidence that secondary structures arising from non-B motifs are not simply associated with increased mutation density-they are possibly causally implicated. Our results suggest that they are determinants of mutagenesis and increase the likelihood of recurrent mutations in the genome. This analysis calls for caution in the interpretation of recurrent mutations and highlights the importance of taking non-B motifs that can simply be inferred from the reference sequence into consideration in background models of mutability henceforth.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Noncanonical secondary structures arising from non-B DNA motifs in the human genome. (A) Normal configuration of human DNA. (B) Left-handed helical structure caused by Z-DNA. (C–F) Schematic representations of the primary sequence of various non-B motifs and their corresponding predicted secondary structures. (G) Length distribution of non-B DNA motifs. (H) Fraction of the human reference genome (hg19) covered by different non-B DNA motifs. (I) Enrichment of occurrences of non-B DNA motifs associated with various chromatin states (see Methods for calculation).
Figure 2.
Figure 2.
Non-B DNA motifs predict somatic mutability in human cancers. (A) Correlations between the number of non-B DNA motifs, and epigenetic features and replication timing, with the number of substitutions (Spearman's rank correlation coefficient). Please note interpretation is directional, e.g., a positive correlation with replication time would indicate increased mutability with early replication time domains, while a negative correlation denotes increased mutability in late replication time domains. (B) The distribution of different non-B DNA motifs in a window of 2 kb centered on substitutions across all tumor types. (C) Fraction of variance explained for predicting the number of mutations in 500-kb bins with random forest regression using non B-DNA motifs and epigenetic features/replication timing as predictors for multiple tumor types. (BRCA) Breast cancer, (LIRI) liver cancer, (OVCA) ovarian cancer, (ESAD) esophageal adenocarcinoma, (GACA) gastric cancer, (PBCA) pedriatic brain cancer, (PACA) pancreatic cancer, (RECA) renal cell carcinoma, (MALY) malignant lymphoma. Error bars represent standard error from 10-fold cross-validation. (D,E) Importance of the different predictors for the random forest regression. The y-axis shows the increase in mean square error (MSE) when the variable is excluded. (**) FDR < 0.01, as determined by a permutation test. (F) PCA. The first two principal components separate mutations (green), non-B DNA motifs (blue), and epigenetics and replication timing domains (red).
Figure 3.
Figure 3.
Non-B DNA motifs are mechanistically linked to mutability through formation of secondary structures. (A) Enrichment of mutagenesis for non-B motifs within their genomic bins, thus correcting for genomic GC variation. Error bars represent the standard error. (B) Depiction of enrichment per genomic bin, for results in A, demonstrating how mutations are enriched for non-B motifs. Red and blue boxes represent non-B motifs. (C) Mutational density in spacers compared to arms for direct repeats, inverted repeats, and mirror repeats across 10 tumor types. Error bars representing standard error are too small to visualize. A Wilcoxon signed-rank test was performed (P-value < 0.001 across all tumors for IR, MR, DR). (D) Heat map showing relative ratio of mutational density of spacers over arms for breast cancer at inverted repeats. (E) Enrichment of mutation density in loops: G-runs across ten cancer types. Error bars represent standard error from bootstrapping with replacement (n = 10,000). (F) Enrichment of mutation density at G-quadruplexes for small loop sizes (≤3 nt) relative to large loop sizes (>3 nt) across 10 cancer types. Error bars represent standard error from bootstrapping with replacement (n = 10,000). A Mann-Whitney U test was performed for each cancer type (P-value < 0.001 across all tumor types). (G) Depiction of two very different secondary structures that both have loop domains which are more mutable than their other components. (H) Some non-B motifs have characteristics such as arm or spacer lengths that increase the likelihood of stable hairpin formation. These perhaps can occur stably more frequently, and thus, their exposed regions are more likely to be damaged and mutated.
Figure 4.
Figure 4.
Non-B motifs contribute to locally elevated mutation rates resulting in recurrent mutations in the human genome. (A) Distribution of the number of recurrent events for 3,476,890 somatic mutations from 560 breast cancers (Nik-Zainal et al. 2016). The values do not fit a truncated Poisson distribution (χ2 test, P < 1 × 10−16) as there are more recurrent mutations than predicted by the null model. (B) Enrichment of nonrecurrent mutations overlapping non-B DNA motifs for indels (I) and substitutions (S). (C) Enrichment of recurrent mutations overlapping non-B DNA motifs for indels (I) and substitutions (S). Mann-Whitney U test for substitutions: P-value < 0.001 for all non-B DNA motifs. Mann-Whitney U test for indels: P-value < 0.001 for STR, H-DNA, Z-DNA, and MR, and P-value < 0.05 for DR and G4.

References

    1. Bacolla A, Tainer JA, Vasquez KM, Cooper DN. 2016. Translocation and deletion breakpoints in cancer genomes are associated with potential non-B DNA-forming sequences. Nucleic Acids Res 44: 5673–5688. - PMC - PubMed
    1. Biffi G, Tannahill D, McCafferty J, Balasubramanian S. 2013. Quantitative visualization of DNA G-quadruplex structures in human cells. Nat Chem 5: 182–186. - PMC - PubMed
    1. The Cancer Genome Atlas Research Network. 2017. Integrated genomic characterization of oesophageal carcinoma. Nature 541: 169–175. - PMC - PubMed
    1. Cer RZ, Donohue DE, Mudunuri US, Temiz NA, Loss MA, Starner NJ, Halusa GN, Volfovsky N, Yi M, Luke BT, et al. 2013. Non-B DB v2.0: a database of predicted non-B DNA-forming motifs and its associated tools. Nucleic Acids Res 41: D94–D100. - PMC - PubMed
    1. De S, Michor F. 2011. DNA secondary structures and epigenetic determinants of cancer genome evolution. Nat Struct Mol Biol 18: 950–955. - PMC - PubMed

Publication types