Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jan 19:11:48.
doi: 10.1186/1471-2164-11-48.

Intergenic, gene terminal, and intragenic CpG islands in the human genome

Affiliations

Intergenic, gene terminal, and intragenic CpG islands in the human genome

Yulia A Medvedeva et al. BMC Genomics. .

Abstract

Background: Recently, it has been discovered that the human genome contains many transcription start sites for non-coding RNA. Regulatory regions related to transcription of this non-coding RNAs are poorly studied. Some of these regulatory regions may be associated with CpG islands located far from transcription start-sites of any protein coding gene. The human genome contains many such CpG islands; however, until now their properties were not systematically studied.

Results: We studied CpG islands located in different regions of the human genome using methods of bioinformatics and comparative genomics. We have observed that CpG islands have a preference to overlap with exons, including exons located far from transcription start site, but usually extend well into introns. Synonymous substitution rate of CpG-containing codons becomes substantially reduced in regions where CpG islands overlap with protein-coding exons, even if they are located far downstream from transcription start site. CAGE tag analysis displayed frequent transcription start sites in all CpG islands, including those found far from transcription start sites of protein coding genes. Computational prediction and analysis of published ChIP-chip data revealed that CpG islands contain an increased number of sites recognized by Sp1 protein. CpG islands containing more CAGE tags usually also contain more Sp1 binding sites. This is especially relevant for CpG islands located in 3' gene regions. Various examples of transcription, confirmed by mRNAs or ESTs, but with no evidence of protein coding genes, were found in CAGE-enriched CpG islands located far from transcription start site of any known protein coding gene.

Conclusions: CpG islands located far from transcription start sites of protein coding genes have transcription initiation activity and display Sp1 binding properties. In exons, overlapping with these islands, the synonymous substitution rate of CpG containing codons is decreased. This suggests that these CpG islands are involved in transcription initiation, possibly of some non-coding RNAs.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The ratio of overlaps of bona fide CGIs and exons (introns) and overlaps of randomly positioned intervals with lengths of exon (intron) and CGI sets. (1) Exon set is fixed, CGI set is sampled. (2) CGI set is fixed, exon set is sampled. 10,000 runs of Monte-Carlo simulation. Length distributions are computed independently for each chromosome.
Figure 2
Figure 2
dN. Non-synonymous substitution rates calculated for various classes of codons overlapping and not overlapping with CGIs in different gene regions.
Figure 3
Figure 3
dS. Synonymous substitution rates calculated for various classes of codons overlapping and not overlapping with CGIs in different gene regions.
Figure 4
Figure 4
dN/dS. Synonymous to non-synonymous substitution rates ratio calculated for various classes of codons overlapping and not overlapping with CGIs in different gene regions.
Figure 5
Figure 5
Statistical significance of the relative occurrence of Sp1 binding sites within different CGI classes and GC-rich shuffled sequences. X-axis: theoretical statistical significance (P-value); Y-axis: the overall fraction of sequences having a statistical significance less or equal than that at the X-axis. A higher statistical significance value reflects more Sp1 sites scoring above the PWM threshold within the selected CGI. CGI classes and GC-rich shuffled sequences are defined in Methods.
Figure 6
Figure 6
ChIP-chip assessment of Sp1 binding in CGIs in different genome segments. Mean and median intensities for Sp1 and input DNA signal for PM tags located in CGIs from different genome segments.
Figure 7
Figure 7
ChIP-chip S/N ratio for Sp1 binding in CGIs in different genome segments. Input/Sp1 signal ratio for PM tags located in CGIs from different genome segments.
Figure 8
Figure 8
Interaction between mutation process and selection pressure in exons overlapping and non-overlapping with CGIs. In coding exons the substitution rate at synonymous sites is approximately 10-fold greater than at nonsynonymous sites. The mCpG → TG transition rate is about 10-fold greater than AG -> GG transition rate. CpG islands protect CpG dinucleotides from methylation, decreasing the transition rate from CG to TG. CpG dinucleotides in CGIs may be under stronger selection than CpG dinucleotides not overlapping within CGIs.
Figure 9
Figure 9
Sequence logo for identified Sp1 site built using WebLogo [59].

References

    1. Bird AP. DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res. 1980;8(7):1499–1504. doi: 10.1093/nar/8.7.1499. - DOI - PMC - PubMed
    1. Ahuja N, Li Q, Mohan AL, Baylin SB, Issa JP. Aging and DNA methylation in colorectal mucosa and cancer. Cancer Res. 1998;58(23):5489–5494. - PubMed
    1. Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987;196(2):261–282. doi: 10.1016/0022-2836(87)90689-9. - DOI - PubMed
    1. Han L, Su B, Li WH, Zhao Z. CpG island density and its correlations with genomic features in mammalian genomes. Genome Biol. 2008;9(5):R79. doi: 10.1186/gb-2008-9-5-r79. - DOI - PMC - PubMed
    1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C. et al. Initial sequencing and analysis of the human genome. Nature. 2001;409(6822):860–921. doi: 10.1038/35057062. - DOI - PubMed

Publication types

LinkOut - more resources