Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Aug 1;25(15):1952-8.
doi: 10.1093/bioinformatics/btp340. Epub 2009 Jun 8.

A clustering approach for identification of enriched domains from histone modification ChIP-Seq data

Affiliations

A clustering approach for identification of enriched domains from histone modification ChIP-Seq data

Chongzhi Zang et al. Bioinformatics. .

Abstract

Motivation: Chromatin states are the key to gene regulation and cell identity. Chromatin immunoprecipitation (ChIP) coupled with high-throughput sequencing (ChIP-Seq) is increasingly being used to map epigenetic states across genomes of diverse species. Chromatin modification profiles are frequently noisy and diffuse, spanning regions ranging from several nucleosomes to large domains of multiple genes. Much of the early work on the identification of ChIP-enriched regions for ChIP-Seq data has focused on identifying localized regions, such as transcription factor binding sites. Bioinformatic tools to identify diffuse domains of ChIP-enriched regions have been lacking.

Results: Based on the biological observation that histone modifications tend to cluster to form domains, we present a method that identifies spatial clusters of signals unlikely to appear by chance. This method pools together enrichment information from neighboring nucleosomes to increase sensitivity and specificity. By using genomic-scale analysis, as well as the examination of loci with validated epigenetic states, we demonstrate that this method outperforms existing methods in the identification of ChIP-enriched signals for histone modification profiles. We demonstrate the application of this unbiased method in important issues in ChIP-Seq data analysis, such as data normalization for quantitative comparison of levels of epigenetic modifications across cell types and growth conditions.

Availability: http://home.gwu.edu/ approximately wpeng/Software.htm.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
(a) Schematic illustration of definition of islands. Shown is a segment of a genomic landscape of ChIP-Seq reads. The x-axis denotes the genome coordinates, where each interval represents a window. The y-axis denotes the read count. Each black vertical bar represents the read count in the respective window. The regions underlined by the green horizontal bars are the two identified islands under g=1 and l0=2. The two windows underlined by brown boxes are gaps in the first island. (b) Schematic illustration of the recursion relation in Equation (6).
Fig. 2.
Fig. 2.
Aggregate score of all significant islands versus gap size for H2A.Z (black) and H4K20me1 (red). The gap size is measured in units of windows. Here, l0=2 and E-value is 0.1.
Fig. 3.
Fig. 3.
Comparison of SICER with other methods. (a) Schematic illustration of the scaling FDR determination. The dark (light) gray circle represent ChIP-enriched regions identified in the full (half-size) library. The non-overlapping area of the light gray circle represents the false positives. (b) FDR versus the island read count coverage. (c) ROC analysis of using the epigenetic states at genes encoding signature cytokines in mouse CD4+ cell Th1, Th2 and Th17 lineages. (d) ROC analysis of H3K4me3 and H3K27me3 in mouse ES cells.
Fig. 4.
Fig. 4.
Composite histone modification profiles across genic regions in human CD133+ (red) and CD36+ (green) cells. The figures in the left (right) panel are made with all reads in the library (only reads on islands, which are identified using ‘input’ library as control).

References

    1. Aagaard L, et al. Functional mammalian homologues of the Drosophila pev-modifier su(var)3-9 encode centromere-associated proteins which complex with the heterochromatin component m31. EMBO J. 1999;18:1923–1938. - PMC - PubMed
    1. Albert I, et al. GeneTrack—a genomic data processing and visualization framework. Bioinformatics. 2008;24:1305–1306. - PMC - PubMed
    1. Bannister AJ, et al. Selective recognition of methylated lysine 9 on histone H3 by the HP1 chromo domain. Nature. 2001;410:120–124. - PubMed
    1. Barski A, et al. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129:823–837. - PubMed
    1. Benjamini Y, Hochberg Y. Controlling the false discovery rate - a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol. 1995;57:289–300.

Publication types