Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002 Mar 19;99(6):3740-5.
doi: 10.1073/pnas.052410099. Epub 2002 Mar 12.

Comprehensive analysis of CpG islands in human chromosomes 21 and 22

Affiliations

Comprehensive analysis of CpG islands in human chromosomes 21 and 22

Daiya Takai et al. Proc Natl Acad Sci U S A. .

Abstract

CpG islands are useful markers for genes in organisms containing 5-methylcytosine in their genomes. In addition, CpG islands located in the promoter regions of genes can play important roles in gene silencing during processes such as X-chromosome inactivation, imprinting, and silencing of intragenomic parasites. The generally accepted definition of what constitutes a CpG island was proposed in 1987 by Gardiner-Garden and Frommer [Gardiner-Garden, M. & Frommer, M. (1987) J. Mol. Biol. 196, 261-282] as being a 200-bp stretch of DNA with a C+G content of 50% and an observed CpG/expected CpG in excess of 0.6. Any definition of a CpG island is somewhat arbitrary, and this one, which was derived before the sequencing of mammalian genomes, will include many sequences that are not necessarily associated with controlling regions of genes but rather are associated with intragenomic parasites. We have therefore used the complete genomic sequences of human chromosomes 21 and 22 to examine the properties of CpG islands in different sequence classes by using a search algorithm that we have developed. Regions of DNA of greater than 500 bp with a G+C equal to or greater than 55% and observed CpG/expected CpG of 0.65 were more likely to be associated with the 5' regions of genes and this definition excluded most Alu-repetitive elements. We also used genome sequences to show strong CpG suppression in the human genome and slight suppression in Drosophila melanogaster and Saccharomyces cerevisiae. This finding is compatible with the recent detection of 5-methylcytosine in Drosophila, and might suggest that S. cerevisiae has, or once had, CpG methylation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematics for the algorithms for CpG island extraction from human genome sequences. (A) Set a 200-base window in the beginning of a contig, compute %GC and ObsCpG/ExpCpG. Shift the window 1 bp after evaluation until the window meets the criteria of a CpG island. (B) If the window meets the criteria, shift the window 200 bp and then evaluate again. (C and D) Repeat these 200-bp shifts until the window does not meet the criteria. (E) Shift the last window 1 bp toward the 5′ end until it meets the criteria. (G) Evaluate total %GC and ObsCpG/ExpCpG. (H) If this large CpG island does not meet the criteria, trim 1 bp from each side until it meets the criteria. (I) Two individual CpG islands were connected if they were separated by less than 100 bp. (J) Values for ObsCpG/ExpCpG and %GC were recalculated to remain within the criteria.
Figure 2
Figure 2
Distributions of %GC, ObsCpG/ExpCpG and length of CpG islands in human chromosomes 21 and 22. Mean value and SD are also indicated in each histogram. (A–C) Distribution of %GC, ObsCpG/ExpCpG, and length of all categories. In these histograms, CpG islands containing both the 5′ of gene and an Alu are included in the 5′ region category, and CpG islands containing both exons and Alus are categorized as exon. (D–F) Distribution of %GC, ObsCpG/ExpCpG, and length of CpG island containing the 5′ region. The occurrence of Alus within sequences defined as 5′ regions is also indicated by horizontal hatching. (G–I) Distribution of %GC, ObsCpG/ExpCpG, and length of CpG islands containing exons. In these three histograms, CpG islands containing both an exon and Alu are represented as Alus. The occurrence of Alus within sequences defined as exon are also indicated by horizontal hatching. (J–L) Distribution of %GC, ObsCpG/ExpCpG, and length of CpG island containing Alu. (M–O) Distribution of %GC, ObsCpG/ExpCpG, and length of CpG islands containing unknown sequences. Both the exponential and Gaussian curves are shown in D, F, G, and I. In E and H, both the exponential curve and the minus second-order curves are shown.
Figure 3
Figure 3
The modified criteria also helped remove Alu sequences previously identified as part of 5′ region CpG islands. In this example, a 1,233-bp fragment originally extracted by the algorithm included two Alu sequences with some CpG suppression associated with the nonhistone chromosome protein 2 like 1 (NHP2L1). The modified stringent criteria reduced the size of the island to 620 bp and excluded the Alu sequences.
Figure 4
Figure 4
(A–F) %GC vs. ObsCpG/ExpCpG plot of a randomly selected 5,000 set of 500-bp-long sequences. Mean value and SD are presented on the plot, and new criteria (%GC ≥ 55%, ObsCpG/ExpCpG ≥ 0.65) are shown as dashed lines. (G) Nearest-neighbor sequence analysis of human chromosomes 21 and 22 and other model organisms.

References

    1. Gardiner-Garden M, Frommer M. J Mol Biol. 1987;196:261–282. - PubMed
    1. Larsen F, Gundersen G, Lopez R, Prydz H. Genomics. 1992;13:1095–1107. - PubMed
    1. Coulondre C, Miller J H, Farabaugh P J, Gilbert W. Nature (London) 1978;274:775–780. - PubMed
    1. Bird A. Genes Dev. 2002;16:6–21. - PubMed
    1. Feil R, Khosla S. Trends Genet. 1999;15:431–435. - PubMed

Publication types

LinkOut - more resources