Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Aug 3:1:189-97.
doi: 10.1093/gbe/evp024.

Long-range bidirectional strand asymmetries originate at CpG islands in the human genome

Affiliations

Long-range bidirectional strand asymmetries originate at CpG islands in the human genome

Paz Polak et al. Genome Biol Evol. .

Abstract

In the human genome, CpG islands (CGIs), which are GC- and CpG-rich sequences, are associated with transcription starting sites (TSSs); in addition, there is evidence that CGIs harbor origins of bidirectional replication (OBRs) and are preferred sites for heteroduplex formation during recombination. Transcription, replication, and recombination processes are known to induce specific mutational patterns in various genomes, and therefore, these patterns are expected to be found around CGIs. We use triple alignments of human, chimp, and macaque to compute the rates of nucleotide substitutions in up to 1 Mbps long intergenic regions on both sides of CGIs. Our analysis revealed that around a CGI there is an asymmetry between complementary substitution rates that is similar to the one that found around the OBR in bacteria. We hypothesize that these asymmetries are induced by differences in the replication of the leading and lagging strand and that a significant number of CGIs overlap OBRs. Within CGIs, we observed a mutational signature of GC-biased gene conversion that is associated with recombination. We suggest that recombination has played a major role in the creation of CGIs.

Keywords: CpG islands; biased gene conversion; origin of bi-directional replication; recombination; strand asymmetries.

PubMed Disclaimer

Figures

F<sc>IG</sc>. 1.—
FIG. 1.—
Sketch of the analyzed regions around and within two classes of CGIs: dCGIs (A) and tCGIs (B). CGIs are denoted by striped boxes. The bold strand is the reference strand that is used for the substitution analysis and for defining the directionality 5′ → 3′ relative to the CGI. The substitution rates are estimated relative to the 5′ end (3′ end) of CGI using a sliding-window analysis. The left (right) coordinate system described the distances of the windows from the 5′ end (3′ end), which is denoted by the left (right) origin (0 k). The left (right) coordinated system starts (ends) at the middle position to the next CGI upstream (downstream) to the 5′ (3′) end of CGI and ends (starts) in the dashed line. However, the analyzed regions in both sides of the CGI are restricted to up to 1 Mbp. (A) dCGIs are CGIs that are intergenic and found at distance of at least 10 kbps from a TSS. The reference strand is the NCBI forward strand, and the position of the dashed line is in the middle of the dCGI. (B) A tCGI is a CGI that harbors TSS of a transcript (exons are denoted by shaded areas). The reference strand is chosen to be the nontemplate (or coding) strand of this gene. The dashed line is coincides with the TSS. The colored bars indicate regions that were analyzed in figure 2: dCGI—intergenic regions (blue); tCGI—intergenic regions (green); and tCGI—introns (red) of genes whose TSS is inside of the tCGI.
F<sc>IG</sc>. 2.—
FIG. 2.—
Ratios between complementary substitution rates in intergenic (blue, green) and intronic (red) regions. The ratios are plotted against the distance from the 5′ end (left 0 k) and 3′ end (right 0 k) of CGIs calculated in 10-kbp long windows. For dCGI and tCGIs, the analyzed sequences are intergenic (see corresponding blue and green bars in fig. 1) and are taken from the reference strand as it is described in figure 1. The analyzed intronic sequences are of genes that their TSS is located within tCGI (see red bars in fig. 1). The ratios in these regions are computed using the nontemplate strand of a gene (see fig. 1); intronic sequences are only available for the 3′ side (left to the gap) analysis. The ratios at 0 k are calculated within the CGIs, for details, see supplementary figure 6 (Supplementary Material online). A shaded histogram of gene lengths in the human genome is presented at the bottom panel demonstrating that strand asymmetries between A → G versus T → C extend over distances larger than of a typical length of a gene.
F<sc>IG</sc>. 3.—
FIG. 3.—
Dependence of W → S/S → W ratio, CpG deamination frequencies, and the stationary GC content from the recombination rate for four classes of CGIs (see Materials and Methods). The CGIs in the four (t-,p-,g-,d-) CGI classes are subdivided according to four recombination rate ranges (0 < r < 0.2, 0.2 < r < 0.8, 0.8 < r, hot spots), which are denoted on the horizontal axis.

Similar articles

Cited by

References

    1. Aerts S, Thijs G, Dabrowski M, Moreau Y, De Moor B. Comprehensive analysis of the base composition around the transcription start site in Metazoa. BMC Genomics. 2004;5:34. - PMC - PubMed
    1. Aladjem MI. Replication in context: dynamic regulation of DNA replication patterns in metazoans. Nat Rev Genet. 2007;8:588–600. - PubMed
    1. Antequera F, Bird A. CpG islands as genomic footprints of promoters that are associated with replication origins. Curr Biol. 1999;9:R661–R667. - PubMed
    1. Bock C, Lengauer T. Computational epigenetics. Bioinformatics. 2008;24:1–10. - PubMed
    1. Cadoret JC, et al. Genome-wide studies highlight indirect links between human replication origins and gene regulation. Proc Natl Acad Sci USA. 2008;105:15837–15842. - PMC - PubMed