Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 30;21(1):270.
doi: 10.1186/s12864-020-6674-1.

The genome-wide landscape of C:G > T:A polymorphism at the CpG contexts in the human population

Affiliations

The genome-wide landscape of C:G > T:A polymorphism at the CpG contexts in the human population

Jeonghwan Youk et al. BMC Genomics. .

Abstract

Background: The C:G > T:A substitution at the CpG dinucleotide contexts is the most frequent substitution type in genome evolution. The mutational process is obviously ongoing in the human germline; however, its impact on common and rare genomic polymorphisms has not been comprehensively investigated yet. Here we observed the landscape and dynamics of C:G > T:A substitutions from population-scale human genome sequencing datasets including ~ 4300 whole-genomes from the 1000 Genomes and the pan-cancer analysis of whole genomes (PCAWG) Project and ~ 60,000 whole-exomes from the Exome Aggregation Consortium (ExAC) database.

Results: Of the 28,084,558 CpG sites in the human reference genome, 26.0% show C:G > T:A substitution in the dataset. Remarkably, CpGs in CpG islands (CGIs) have a much lower frequency of such mutations (5.6%). Interestingly, the mutation frequency of CGIs is not uniform with a significantly higher C:G > T:A substitution rate for intragenic CGIs compared to other types. For non-CGI CpGs, the mutation rate was positively correlated with the distance from the nearest CGI up to 2 kb. Finally, we found the impact of negative selection for coding CpG mutations resulting in amino acid change.

Conclusions: This study provides the first unbiased rate of C:G > T:A substitution at the CpG dinucleotide contexts, using population-scale human genome sequencing data. Our findings provide insights into the dynamics of the mutation acquisition in the human genome.

Keywords: CpG; CpG island; Methylation; Single nucleotide polymorphism; Transition.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
C > T polymorphism rate in the human population and classification of CpG islands (CGIs). a Schematic illustration of the classification of CGIs. b The distribution of the size of the CGIs according to the CGI types. CGIs related to a transcriptional start site (TSS) are significantly longer compared to the others. c The statistics of CpG dinucleotides in the reference human genome. Among C or G at the CpG dinucleotide sequence context, approximately 7% are located in CGIs. Approximately half of the CpGs in CGIs are located in the TSS-coding CGIs. d Mutational spectrum accumulated during human genome evolution. Decomposition of the mutational spectrum revealed that C > T transitions at the CpG contexts (Signature 1) were one of three major signatures during human genome evolution. e Mutation rate of CpGs based on CGI classifications (Error bars indicate 95% confidence intervals). Interestingly, intragenic coding CGIs have the highest mutation rate among the five CGI types. f The distribution of allele frequencies of the C > T transitions according to the CGI types. As the higher the mutation rate of the CpGs in e becomes higher, the absolute value of allele frequencies tends to be higher. A logarithmic scale is applied to the y-axis
Fig. 2
Fig. 2
C > T polymorphism rate and methylation proportion around each type of CGIs. a Normalized incidence of C > T mutation according to the distance from the border of the CGIs. In the CGI shores, as the distance from the border of CGI becomes closer, the mutation rate of C > T is lower. Beyond 2 kb, the mutation rate of C:G > T:A plateaus. As a control, the normalized incidence of the C > T mutation at the non-CpG contexts is also depicted as grey dots. b The normalized incidence of the C > T mutation for each CGI type. In the CGI shores of the intragenic-coding CGIs, the incidence of C > T mutations is higher compared to the other CGIs. The normalized incidence of C > T mutations in the TSS-coding CGIs and TSS-noncoding CGIs tended to be lower than that of the non-TSS CGIs. c The pattern of the mean methylation percentage in each CGI type according to the distance from the border of CGIs. On the whole, the methylation percentage in the CGI shores is well correlated with the order of the normalized incidence of C > T mutations shown in (B). d Violin plots of the distribution of the mean methylation proportion according to each CGI type. The methylation pattern of the Intragenic-coding CGIs uniquely shows the bimodal distribution. e Intragenic coding CGIs with a mean methylation of > 67 and < 33% are classified as group A and B, respectively. Interestingly, the size of the CGIs in Group A is significantly shorter than that of the CGIs in Group B. f Mean mutation rate of the CGIs in Group A and B
Fig. 3
Fig. 3
The difference in the C > T polymorphism rate according to the resulting amino acid changes. a Among the cytosines at the CpG dinucleotide sequence contexts in the coding sequences, the methylation of which is ≥67%, the proportions of reported C > T substitutions in the ExAC (inner circle) and the 1000 Genomes and the PCAWG (outer ring) database are illustrated. Nonsense-primed C > T mutations are negatively selected compared to missense and synonymous substitutions. b The distribution of the allele counts of C > T substitutions at the CpG contexts in the coding sequences in the ExAC database. Nonsense-primed C > T genomic loci have a singleton of more than 40%. As the effect of the amino acid change becomes smaller, the more the number of humans who have a C > T substitution on a specific locus increases, wherever a certain cytosine or guanine is located

References

    1. Holliday R, Pugh JE. DNA modification mechanisms and gene activity during development. Science (New York, NY) 1975;187(4173):226–232. doi: 10.1126/science.1111098. - DOI - PubMed
    1. Riggs AD. X inactivation, differentiation, and DNA methylation. Cytogenet Gen Res. 1975;14(1):9–25. doi: 10.1159/000130315. - DOI - PubMed
    1. Coulondre C, Miller JH, Farabaugh PJ, Gilbert W. Molecular basis of base substitution hotspots in Escherichia coli. Nature. 1978;274(5673):775–780. doi: 10.1038/274775a0. - DOI - PubMed
    1. Wang RY, Kuo KC, Gehrke CW, Huang LH, Ehrlich M. Heat- and alkali-induced deamination of 5-methylcytosine and cytosine residues in DNA. Biochim Biophys Acta. 1982;697(3):371–377. doi: 10.1016/0167-4781(82)90101-4. - DOI - PubMed
    1. Bird AP. DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res. 1980;8(7):1499–1504. doi: 10.1093/nar/8.7.1499. - DOI - PMC - PubMed

LinkOut - more resources