Sequence context at human single nucleotide polymorphisms: overrepresentation of CpG dinucleotide at polymorphic sites and suppression of variation in CpG islands
- PMID: 12628237
- DOI: 10.1016/s0022-2836(03)00120-7
Sequence context at human single nucleotide polymorphisms: overrepresentation of CpG dinucleotide at polymorphic sites and suppression of variation in CpG islands
Abstract
Human polymorphisms originate as mutations, and the influence of context on mutagenesis should be reflected in the distribution of sequences surrounding single nucleotide polymorphisms (SNPs). We have performed a computational survey of nearly two million human SNPs to determine if sequence-dependent hotspots for polymorphism exist in the human genome. Here we show that sequences containing CpG dinucleotides, which occur at low frequencies in the human genome, are 6.7-fold more abundant at polymorphic sites than expected. In contrast, polymorphisms in CpG sequences located within CpG islands, important regulatory regions that modulate gene expression, are 6.8-fold less prevalent than expected. The distribution of polymorphic alleles at CpGs in CpG islands is also significantly different from that in non-island regions. These data strongly support a role for 5-methylcytosine deamination in the generation of human variation, and suggest that variation at CpGs in islands is suppressed.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
