Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 22;49(3):1497-1516.
doi: 10.1093/nar/gkaa1269.

Non-B DNA: a major contributor to small- and large-scale variation in nucleotide substitution frequencies across the genome

Affiliations

Non-B DNA: a major contributor to small- and large-scale variation in nucleotide substitution frequencies across the genome

Wilfried M Guiblet et al. Nucleic Acids Res. .

Abstract

Approximately 13% of the human genome can fold into non-canonical (non-B) DNA structures (e.g. G-quadruplexes, Z-DNA, etc.), which have been implicated in vital cellular processes. Non-B DNA also hinders replication, increasing errors and facilitating mutagenesis, yet its contribution to genome-wide variation in mutation rates remains unexplored. Here, we conducted a comprehensive analysis of nucleotide substitution frequencies at non-B DNA loci within noncoding, non-repetitive genome regions, their ±2 kb flanking regions, and 1-Megabase windows, using human-orangutan divergence and human single-nucleotide polymorphisms. Functional data analysis at single-base resolution demonstrated that substitution frequencies are usually elevated at non-B DNA, with patterns specific to each non-B DNA type. Mirror, direct and inverted repeats have higher substitution frequencies in spacers than in repeat arms, whereas G-quadruplexes, particularly stable ones, have higher substitution frequencies in loops than in stems. Several non-B DNA types also affect substitution frequencies in their flanking regions. Finally, non-B DNA explains more variation than any other predictor in multiple regression models for diversity or divergence at 1-Megabase scale. Thus, non-B DNA substantially contributes to variation in substitution frequencies at small and large scales. Our results highlight the role of non-B DNA in germline mutagenesis with implications to evolution and genetic diseases.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Guiblet et al. show that loci capable of forming non-canonical (non-B) DNA structures are a major driver of variation in nucleotide substitution levels across the genome. Image credit: Wilfried Guiblet.
Figure 1.
Figure 1.
Schematic of different types of non-B DNA structures. (A) G-quadruplex, (B) H-DNA, (C) Z-DNA, (D) cruciform, (E) slipped strands and (F) A-tract bending.
Figure 2.
Figure 2.
Genome-wide nucleotide substitution frequencies at G4 loci and their flanking sequences. The positions of nucleotide substitutions within motifs were scaled based on motif size (see Materials and Methods for details). Stems are runs of guanines and loops are unspecified nucleotides between stems. Flanking regions are the 2 kb up- and downstream from the loci. For clarity of visualization, only the first 100 bps are shown (the full 2 kb are shown in Supplementary Figure S8) and the Y-axes are displayed on a log scale. Gray areas indicate significantly different rates between groups (IWTomics adjusted P-value curve <0.01). A comparison between all G4 loci and control sequences for (A) single-nucleotide polymorphism (SNP) frequencies and (B) fixed nucleotide substitution (FNS) frequencies. A comparison between stable and unstable G4 loci for (C) SNP and (D) FNS frequencies.
Figure 3.
Figure 3.
Genome-wide single-nucleotide polymorphism (SNP) frequencies at non-G4 non-B DNA loci and their flanking sequences. The positions of SNPs within motifs were scaled based on motif size (see Materials and Methods for details). Inverted, direct, and mirror repeats are split into spacers and repeat arms, and A-phased repeats are split into A-tracts and spacers. For clarity of visualization, only the first 100 bps are shown (the full 2 kb are shown in Supplementary Figure S8) and the Y-axes are displayed on a log scale. Gray areas indicate significantly different SNP frequency in non-B DNA vs. control sequences (IWTomics adjusted P-value curve < 0.01).
Figure 4.
Figure 4.
Genome-wide fixed nucleotide substitution (FNS) frequencies at non-G4 non-B DNA loci and their flanking sequences. The positions of FNSs within motifs were scaled based on motif size (see Materials and Methods for details). Inverted, direct, and mirror repeats are split into spacers and repeat arms, and A-phased repeats are split into A-tracts and spacers. Flanking regions are the 2 kb up- and downstream from the loci. For clarity of visualization, only the first 100 bp are shown (the full 2 kb are shown in Supplementary Figure S8) and the Y-axes are displayed on a log scale. Gray areas indicate significantly different FNS frequency in non-B DNA versus controls (IWTomics adjusted P-value curve < 0.01).
Figure 5.
Figure 5.
Frequencies of polymorphic substitutions at the immediate first 5′ and 3′ flanking positions of stable G4 loci annotated on the reference strand. Only the frequencies of trinucleotides present at the immediate flanking positions of stable G4 loci were compared with those present at control sequences (trinucleotides present only in control sequences were not considered). A correction for the trinucleotide context was applied (see Materials and Methods). Two-sided Fisher's exact test was used to evaluate significant differences, and P-values were adjusted for multiple testing using Bonferroni correction. An asterisk (*) marks significant differences between G4 and control sequences (adjusted P-value < 0.05).
Figure 6.
Figure 6.
Relationships between fixed nucleotide substitution (FNS) frequency and non-B DNA. (A) G-quadruplexes coverage weighted by stability, (B) A-phased repeats coverage, (C) inverted repeats coverage, (D) direct repeats coverage, (E) Z-DNA motifs coverage, and (F) mirror repeats coverage. Red curves represent loess (locally estimated scatterplot smoothing) fits superimposed to the scatterplots to visualize trends. See Supplementary Figure S11 for an analogous analysis performed using SNP data.

Similar articles

Cited by

References

    1. Hodgkinson A., Eyre-Walker A.. Variation in the mutation rate across mammalian genomes. Nat. Rev. Genet. 2011; 12:756–766. - PubMed
    1. Makova K.D., Hardison R.C.. The effects of chromatin organization on variation in mutation rates in the genome. Nat. Rev. Genet. 2015; 16:213–223. - PMC - PubMed
    1. Xie K.T., Wang G., Thompson A.C., Wucherpfennig J.I., Reimchen T.E., MacColl A.D.C., Schluter D., Bell M.A., Vasquez K.M., Kingsley D.M.. DNA fragility in the parallel evolution of pelvic reduction in stickleback fish. Science. 2019; 363:81–84. - PMC - PubMed
    1. Gojobori T., Li W.H., Graur D.. Patterns of nucleotide substitution in pseudogenes and functional genes. J. Mol. Evol. 1982; 18:360–369. - PubMed
    1. Bulmer M. Neighboring base effects on substitution rates in pseudogenes. Mol. Biol. Evol. 1986; 3:322–329. - PubMed

Publication types