Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Feb 29;4(2):e1000015.
doi: 10.1371/journal.pcbi.1000015.

Mutations of different molecular origins exhibit contrasting patterns of regional substitution rate variation

Affiliations

Mutations of different molecular origins exhibit contrasting patterns of regional substitution rate variation

Navin Elango et al. PLoS Comput Biol. .

Abstract

Transitions at CpG dinucleotides, referred to as "CpG substitutions", are a major mutational input into vertebrate genomes and a leading cause of human genetic disease. The prevalence of CpG substitutions is due to their mutational origin, which is dependent on DNA methylation. In comparison, other single nucleotide substitutions (for example those occurring at GpC dinucleotides) mainly arise from errors during DNA replication. Here we analyzed high quality BAC-based data from human, chimpanzee, and baboon to investigate regional variation of CpG substitution rates. We show that CpG substitutions occur approximately 15 times more frequently than other single nucleotide substitutions in primate genomes, and that they exhibit substantial regional variation. Patterns of CpG rate variation are consistent with differences in methylation level and susceptibility to subsequent deamination. In particular, we propose a "distance-decaying" hypothesis, positing that due to the molecular mechanism of a CpG substitution, rates are correlated with the stability of double-stranded DNA surrounding each CpG dinucleotide, and the effect of local DNA stability may decrease with distance from the CpG dinucleotide.Consistent with our "distance-decaying" hypothesis, rates of CpG substitution are strongly (negatively) correlated with regional G+C content. The influence of G+C content decays as the distance from the target CpG site increases. We estimate that the influence of local G+C content extends up to 1,500 approximately 2,000 bps centered on each CpG site. We also show that the distance-decaying relationship persisted when we controlled for the effect of long-range homogeneity of nucleotide composition. GpC sites, in contrast, do not exhibit such "distance-decaying" relationship. Our results highlight an example of the distinctive properties of methylation-dependent substitutions versus substitutions mostly arising from errors during DNA replication. Furthermore, the negative relationship between G+C content and CpG rates may provide an explanation for the observation that GC-rich SINEs show lower CpG rates than other repetitive elements.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Histogram of the rate of CG->TA substitutions in non-coding regions.
The rate of CG->TA transitions in CpG sites (A), and GpC sites (B) in 50 kb segments of non-coding regions having at least 10,000 aligned sites. Variation of CpG substitution rate among non-coding regions is significantly greater than that expected under a uniform substitution rate model. A similar result was obtained for GpC sites (see text).
Figure 2
Figure 2. Negative correlation between the rate of CG->TA substitution and G+C content of non-coding regions.
(A) Non-coding regions (intergenic and introns) were partitioned into six equal-sized bins based on their G+C contents. The rates of CG->TA substitutions in CpG sites of these bins are negatively correlated with their G+C contents. The negative relationship holds when introns were analyzed separately. In case of intergenic regions the relationship was not significant. Nevertheless, we found a negative trend. (B) GpC substitution rates in non-coding regions exhibited a negative relationship with G+C contents. When divided into intergenic regions and introns, however, the relationships were not significant, although there was a clear negative trend. Refer text for r2 values and P- values.
Figure 3
Figure 3. Sliding window analysis of the relationship between CpG substitution rate and G+C content of windows.
(A) At each distance (along the X-axis), CpG sites were divided into four bins based on G+C content at that distance from the site (as measured from the G+C content of the 200 bps window centered at that distance; G+C<38: red curve, 38< = G+C<45%: green curve, 45< = G+C<52%: blue curve, G+C>52%: black curve). The proportion of CpG sites mutated in each of these bins is plotted as a function of distance from the site. At distances closer to the CpG site, the rate of substitution in high local-GC bins (black curve) is clearly lower compared to that of low local-GC bins (red curve). This relationship progressively declines as we move farther away from the site, suggesting a distance-decaying relationship between G+C content and CpG substitution rate. In case of GpC sites, we do not observe a distance-decaying effect (see inset). (B) Results of the chi-square test for the independence of the rate of CpG substitution and the G+C content of the windows at each distance, in log scale. The blue line indicates the P-value cutoff of 0.05 [or log10 (P-value) = −1.30]. The P-values are very low at distances close to the CpG site, and progressively become larger as the distance from the CpG site increases (distance-decaying effect). The rate of CpG substitution becomes independent of the G+C content [log10 (P-value)>−1.30] after ∼2,000 bps from the CpG. (C): Results of the chi-square test for the independence of the rate of GpC substitution and the G+C content of the windows at each distance, in log scale. Again, the blue line indicates the P-value cutoff of 0.05 (or log10 (P-value) = −1.30). The rate of GpC substitution becomes independent of the G+C content [log10 (P-value)>−1.30] at a distance very close to the GpC site, and no distance-decaying effect was observed.
Figure 4
Figure 4. Sliding window analysis of the relationship between CpG substitution rate and normalized G+C content.
The same experiment as in Figure 3 with the G+C content of each window normalized with respect to GCglobal (removing global effect). (A) The distance-decaying effect of G+C content on the rate of CpG substitution persists even after removing the global effect. In case of GpC substitutions, there was no distance-decaying effect. (B) Results of the chi-square test for the independence of the rate of CpG substitution and the G+C content of the windows. The blue line indicates log10 (P-value) = −1.30. The distance-decaying effect subsided after ∼1,500 bps. (C) Results of the same experiment as in (B), but for GpC sites. There is no distance-decaying effect, as expected.

Similar articles

Cited by

References

    1. Casane D, Boissinot S, Chang BH-J, Shimmin LC, Li W-H. Mutation pattern variation among regions of the primate genome. Jf Mol Evol. 1997;45:216–226. - PubMed
    1. Consortium TCSaA. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 2005;437:69–87. - PubMed
    1. Gaffney DJ, Keightley PD. The scale of mutational variation in the murid genome. Genome Res. 2005;15:1086–1094. - PMC - PubMed
    1. Hellmann I, Prufer K, Ji H, Zody MC, Paabo S, et al. Why do human diversity levels vary at a megabase scale? Genome Res. 2005;15:1222–1231. - PMC - PubMed
    1. Hwang DG, Green P. Inaugural Article: Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. Proc Natl Acad Sci U S A. 2004;101:13994–14001. - PMC - PubMed

Publication types