Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Aug;22(8):1419-25.
doi: 10.1101/gr.140236.112. Epub 2012 Jun 11.

Genome-wide genetic variations are highly correlated with proximal DNA methylation patterns

Affiliations

Genome-wide genetic variations are highly correlated with proximal DNA methylation patterns

Wei Qu et al. Genome Res. 2012 Aug.

Abstract

5-methyl-cytosines at CpG sites frequently mutate into thymines, accounting for a large proportion of spontaneous point mutations. The repair system would leave substantial numbers of errors in neighboring regions if the synthesis of erased gaps around deaminated 5-methyl-cytosines is error-prone. Indeed, we identified an unexpected genome-wide role of the CpG methylation state as a major determinant of proximal natural genetic variation. Specifically, 507 Mbp (∼18%) of the human genome was within 10 bp of a CpG site; in these regions, the single nucleotide polymorphism (SNP) rate significantly increased by ∼50% (P < 10(-566) by a two-proportion z-test) if the neighboring CpG sites are methylated. To reconfirm this finding in another vertebrate, we compared six single-base resolution methylomes in two inbred medaka (Oryzias latipes) strains with sufficient genetic divergence (3.4%). We found that the SNP rate also increased by ∼50% (P < 10(-2170)), and the substitution rates in all dinucleotides increased simultaneously (P < 10(-441)) around methylated CpG sites. In the hypomethylated regions, the "CGCG" motif was significantly enriched (P < 10(-680)) and evolutionarily conserved (P = ∼ 0.203%), and slow CpG deamination rather than fast CpG gain was seen, indicating a possible role of CGCG as a candidate cis-element for the hypomethylation state. In regions that were hypermethylated in germline-like tissues but were hypomethylated in somatic liver cells, the SNP rate was significantly smaller than that in hypomethylated regions in both tissues, suggesting a positive selective pressure during DNA methylation reprogramming. This is the first report of findings showing that the CpG methylation state is significantly correlated with the characteristics of evolutionary change in neighboring DNA.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Methylation patterns and substitution rates in the inbred medaka strains, Hd-rR and HNI. (A) SNP (single-nucleotide polymorphism) rates in hyper- and hypomethylated CpG blocks in the reference human genome (hg19). The difference in SNP rates was significant in the entire genome (P < 10−566 by a two-proportion z-test) (Supplemental Table S4), in intergenic regions (P < 10−305), in exons (P < 10−29), and in introns (P < 10−151). (B) Methylation level and SNP distribution in the homologous regions of the human and medaka genomes where gene RPS13 is coded. (C,D) Comparisons of the methylation patterns in Hd-rR and HNI. The vertical and horizontal axes indicate methylation level. The heat map uses logarithmic coordinates and presents the number of corresponding CpG site blocks. Conserved hypermethylated and hypomethylated patterns between the two strains were dominant, except for a small number of hot spots observed in the differentially methylated regions (differences in methylation level ≥ 0.5). (E) Comparison of the methylation patterns in blastulae and testes in Hd-rR. (F) SNP rates in hypo-, hyper-, and strain-differentially methylated regions in medaka blastulae grouped by the entire genome, intergenic regions, exons, and introns. The differences between SNP rates of hypo- and hypermethylated regions were remarkable: P < 10−2170 (genome), P < 10−2170 (intergenic regions), P < 10−113 (exons), and P < 10−589 (introns) according to a two-proportion z-test (Supplemental Table S4). Furthermore, the differences between SNP rates of strain-differentially and hypermethylated regions were also significant (Supplemental Table S4). (G) Dinucleotide substitution rates in the whole medaka genome, intergenic regions, exons, and introns in CpG site blocks with various methylation states. Color key presents mutation rates: blue for hypermethylated (methylation level ≥ 0.8 in both strains); red for hypomethylated (methylation level ≤ 0.2 in both strains); and green for strain-differentially methylated (difference in methylation level between the two strains ≥ 0.5) in blastulae. The axes in each radar chart represent substitution rates of individual dinucleotides. Each dinucleotide shows the same substitution rate as its reverse complementary dinucleotide. Significant differences between substitution rates in hypo- and hypermethylated regions were observed for all dinucleotides, and the P-values, according to a two-proportion z-test, were P < 10−441 (genome), P < 10−263 (intergenic regions), P < 10−15 (exons), and P < 10−69 (introns) (Supplemental Table S5).
Figure 2.
Figure 2.
(A) Representative mutation patterns in evolutionarily conserved hypo- and hypermethylated regions and in regions that are differentially methylated in different strains, where substitution rates ascend from top to bottom. The CGCG motif is significantly conserved in hypomethylated regions (P < 10−680 by a two-proportion z-test). (B) Number of strain-differentially methylated CpG site blocks with either gain or loss of the CGCG motif. Of 1656 CGCG motif occurrences in multiple alignments of the genomes of the three strains, 52 (24, respectively) were conserved in hypomethylated (hypermethylated) regions of one of Hd-rR or HNI but were mutated in hypermethylated (hypomethylated) regions of the other strain. We then evaluated the significance of the difference between the means in the two groups, 52/1656 vs. 24/1656, to obtain a P-value of 0.203% according to a two-proportion z-test. (C) The rates of dinucleotide gain in CpG site blocks. Gain rates in the hypo- or hypermethylated CpG site blocks and each mate in strain-differentially methylated blocks are shown.
Figure 3.
Figure 3.
Methylation patterns and substitution rates in different tissue types. (A) Comparison of the methylation patterns in Hd-rR: blastulae vs. liver, and testes vs. liver. The vertical and horizontal axes show methylation level. The methylation patterns in HNI are presented in Supplemental Fig. S5 and are similar to those in these figures. (B) SNP rates in CpG site blocks with three methylation states: hypomethylated in both of the two tissue types, hypermethylated in both, and differentially methylated between the two tissue types. Significant differences in SNP rates were seen between tissue-differentially and hypomethylated regions (P < 10−82 by a two-proportion z-test) (Supplemental Table S6), and between hypo- and hypermethylated CpG site blocks (P < 10−2170) (Supplemental Table S6). (C) Dinucleotide substitution rates in CpG site blocks with the three methylation states. The substitution rates of all dinucleotides, except for CC/GG/CG, in tissue-differentially methylated regions were significantly lower than those in hypomethylated regions (P < 10−3 by a two-proportion z-test) (Supplemental Table S7A). (D) Representative mutation patterns in hypomethylated, hypermethylated, and somatic cell-specific hypomethylated (germline-like-specific hypermethylated) regions. Somatic cell-specific hypomethylated regions exhibited the lowest mutation rates.
Figure 4.
Figure 4.
A working model for a higher mutation rate in hypermethylated regions. Deaminated cytosine (U:G mismatch) is repaired by base excision repair (BER), but deaminated 5-methyl-cytosine (T:G mismatch) is corrected by more complicated repair pathways. An alternative mismatch repair system (MMR) might involve low-fidelity DNA polymerase, resulting in the error-prone synthesis of erased gaps.

References

    1. Becker C, Hagmann J, Muller J, Koenig D, Stegle O, Borgwardt K, Weigel D 2011. Spontaneous epigenetic variation in the Arabidopsis thaliana methylome. Nature 480: 245–249 - PubMed
    1. Burge C, Campbell AM, Karlin S 1992. Over-representation and under-representation of short oligonucleotides in DNA-sequences. Proc Natl Acad Sci 89: 1358–1362 - PMC - PubMed
    1. Chandler LA, Ghazi H, Jones PA, Boukamp P, Fusenig NE 1987. Allele-specific methylation of the human c-Ha-ras-1 gene. Cell 50: 711–717 - PubMed
    1. Cohen NM, Kenigsberg E, Tanay A 2011. Primate CpG islands are maintained by heterogeneous evolutionary regimes involving minimal selection. Cell 145: 773–786 - PubMed
    1. Cokus SJ, Feng SH, Zhang XY, Chen ZG, Merriman B, Haudenschild CD, Pradhan S, Nelson SF, Pellegrini M, Jacobsen SE 2008. Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452: 215–219 - PMC - PubMed

Publication types

LinkOut - more resources