Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Oct 25:10:209-224.
doi: 10.4137/BBI.S40628. eCollection 2016.

Iterative Fragmentation Improves the Detection of ChIP-seq Peaks for Inactive Histone Marks

Affiliations

Iterative Fragmentation Improves the Detection of ChIP-seq Peaks for Inactive Histone Marks

Miklós Laczik et al. Bioinform Biol Insights. .

Abstract

As chromatin immunoprecipitation (ChIP) sequencing is becoming the dominant technique for studying chromatin modifications, new protocols surface to improve the method. Bioinformatics is also essential to analyze and understand the results, and precise analysis helps us to identify the effects of protocol optimizations. We applied iterative sonication - sending the fragmented DNA after ChIP through additional round(s) of shearing - to a number of samples, testing the effects on different histone marks, aiming to uncover potential benefits of inactive histone marks specifically. We developed an analysis pipeline that utilizes our unique, enrichment-type specific approach to peak calling. With the help of this pipeline, we managed to accurately describe the advantages and disadvantages of the iterative refragmentation technique, and we successfully identified possible fields for its applications, where it enhances the results greatly. In addition to the resonication protocol description, we provide guidelines for peak calling optimization and a freely implementable pipeline for data analysis.

Keywords: ChIP; ChIP-seq; bioinformatics; chromatin; heterochromatin; histone marks; peak calling; sonication.

PubMed Disclaimer

Conflict of interest statement

Authors disclose no potential conflicts of interest.

Figures

Figure 1
Figure 1
Overview and effect of the reshearing method. (A) The relevant steps of ChIP-seq sample preparation, using the traditional method. After the fragmentation of the protein, we have a mixture of fragments of different sizes. The ones that carry our protein of interest can be bound by immunoprecipitation. After the decrosslinking and purification step, we get the DNA fragments, where the ones over the optimal size range are shown in red, and the ones under the size range are in green for better visual interpretation. During the size selection before library preparation and sequencing, these fragments are discarded; thus, a significant amount of the sample is lost. (B) The reshearing method preserves the fragments that are out of the ideal size range. By doing additional rounds of sonication on the eluted DNA, the long fragments break up into shorter ones (see the fragments in red), which enables them to proceed to library preparation and sonication. Sample loss is reduced significantly. (CE) Demonstrating the effect of the reshearing on the actual H3K27me3 sample. The fragment range distribution is measured by a 2100 Bioanalyzer, images were generated by its software provided by Agilent; the control marks are at 35 bp and 10380 bp. (C) The original fragment distribution, before the reshearing step. (D) The size distribution after two rounds of five cycles of reshearing. A reduction of the large peak in the large size range and a slight shift toward the smaller sizes is already visible. (E) The distribution after the third round of reshearing. Here the shift is already complete: the large fragments have disappeared and the middle short section of the size range is enlarged, showing that we have reached the desired size distribution.
Figure 2
Figure 2
Comparison of the performance of various peak callers developed for broad peak detection, and the consistence of the peak type, regardless of cell types. (A) This image is taken from the IGV genome viewer, where various peak calling results are displayed on a HeLa-S3 in-house control data set. The upper coverage track in blue shows a longer stretch of enrichment from a ChIP-seq experiment targeting H3K27me3 histone marks. Below that the boxes show the peaks determined by selected peak callers. In a top-down order, the tracks show the peaks of the following software tools: Sicer (with the optimalized settings we established for the studied histone mark) and Sicer with default settings in red, HiddenDomains in green, MACS and MACS2 in purple, Zinba in cyan, BroadPeak in orange, and Rseg in yellow. The image represents the difficulties of detecting broad enrichments: Rseg and BroadPeak detect the whole visible region (and more) as a huge, single peak, while MACS2 and Zinba fail to recognize the enriched regions. HiddenDomains and MACS segment the enrichments into several narrow peaks. Sicer is oversensitive with the default settings, though it calls the enriched regions in the correct way, and with our optimized settings, it is able to properly differentiate between the enrichments and the background. (B) This image was taken from the UCSC Genome Browser, featuring 18 different cell types (or different treatments in some cases), submitted by the Broad Institute to the ENCODE project. The samples are (as they appear on the UCSC website): GM12878, H1-hESC, K562, A549 DEX, A549 EtOH, HeLa-S3, HepG2, HUVEC, CD14+, Dnd41, HMEC, HSMM, HSMMtube, NH-A, NHDF-Ad, NHEK, NHLF, and Osteobl. Apparently, the cell type does not influence what type of enrichment is generated by a certain histone mark, the enrichment types are so consistent that we can reason that the same peak caller settings should be optimal for all of them.
Figure 3
Figure 3
Flowchart of our ChIP-seq analysis pipeline. For each step, we either used a carefully selected public software or wrote our own proprietary scripts.
Figure 4
Figure 4
Various effects of the reshearing on histone marks are showcased by coverage graph and peak detection marks displayed in the IGV genome browser. The control samples are shown in blue (upper tracks), and the corresponding resheared samples are in red (lower tracks); the scales are identical for the corresponding pairs. (A) Reshearing effect on H3K4me1. The enrichments are visibly higher and wider in the resheared sample, but the peaks show a merging effect. (B) Reshearing effect on H3K4me3. Reshearing makes the peaks higher and wider, enabling the detection of smaller, otherwise insignificant peaks. (CF) Reshearing effect on H3K27me3, which marks inactive regions and usually forms long stretches of relatively low enrichment. Note how all the pictures show a great increase in signal-to-noise ratio after reshearing. (C) A spectacular display of the separation effect. (D) The opposite of the separation effect: the peaks are dissected in the control sample but correctly recognized in the resheared sample, due to the filling-up effect. (E) Reshearing enables the correct peak detection by significantly increasing the signal-to-noise ratio. In the control samples, the peaks are only partially detected or not detected at all. (F) Several characteristic inactive mark effects are visible in this image: the control sample exhibits dissection of the peaks, partial or nondetection; reshearing eliminates these problems with the significantly better contrast to background and the filling-up effect.
Figure 5
Figure 5
Average peak profiles and correlations between the resheared and control samples. The average peak coverages were calculated by binning every peak into 100 bins, then calculating the mean of coverages for each bin rank. The scatterplots show the correlation between the coverages of genomes, examined in 100 bp windows. (AC) Average peak coverage for the control samples. The histone mark-specific differences in enrichment and characteristic peak shapes can be observed. (DF) Average peak coverages for the resheared samples. Note that all histone marks exhibit a generally higher coverage and a more extended shoulder area. (GI) Scatterplots show the linear correlation between the control and resheared sample coverage profiles. The distribution of markers reveals a strong linear correlation, and also some differential coverage (being preferentially higher in resheared samples) is exposed. The r value in brackets is the Pearson’s coefficient of correlation. To improve visibility, extreme high coverage values have been removed and alpha blending was used to indicate the density of markers. This analysis provides valuable insight into correlation, covariation, and reproducibility beyond the limits of peak calling, as not every enrichment can be called as a peak, and compared between samples, and when we compare the ChIP-seq results of two different methods, it is essential to also check the read accumulation and depletion in undetected regions.
Figure 6
Figure 6
Schematic summarization of the effects of ChIP-seq enhancement techniques. We compared the reshearing technique that we use to the ChIP-exo technique. The blue circle represents the protein, the red line represents the DNA fragment, the purple lightning refers to sonication, and the yellow symbol is the exonuclease. On the right example, coverage graphs are displayed, with a likely peak detection pattern (detected peaks are shown as green boxes below the coverage graphs). In contrast with the standard protocol, the reshearing technique incorporates longer fragments in the analysis through additional rounds of sonication, which would otherwise be discarded, while ChIP-exo decreases the size of the fragments by digesting the parts of the DNA not bound to a protein with lambda exonuclease. For profiles consisting of narrow peaks, the reshearing technique increases sensitivity with the more fragments involved; thus, even smaller enrichments become detectable, but the peaks also become wider, to the point of being merged. ChIP-exo, on the other hand, decreases the enrichments, some smaller peaks can disappear altogether, but it increases specificity and enables the accurate detection of binding sites. With broad peak profiles, however, we can observe that the standard technique often hampers proper peak detection, as the enrichments are only partial and difficult to distinguish from the background, due to the sample loss. Therefore, broad enrichments, with their typical variable height is often detected only partially, dissecting the enrichment into several smaller parts that reflect local higher coverage within the enrichment or the peak caller is unable to differentiate the enrichment from the background properly, and consequently, either several enrichments are detected as one, or the enrichment is not detected at all. Reshearing improves peak calling by filling up the valleys within an enrichment and causing better peak separation. ChIP-exo, however, promotes the partial, dissecting peak detection by deepening the valleys within an enrichment. In turn, it can be utilized to determine the locations of nucleosomes with precision.

References

    1. Marino-Ramirez L, Kann MG, Shoemaker BA, Landsman D. Histone structure and nucleosome stability. Expert Rev Proteomics. 2005;2(5):719–29. - PMC - PubMed
    1. Oike T, Ogiwara H, Amornwichet N, Nakano T, Kohno T. Chromatin- regulating proteins as targets for cancer therapy. J Radiat Res. 2014;55(4):613–28. - PMC - PubMed
    1. Baylin SB, Jones PA. A decade of exploring the cancer epigenome – biological and translational implications. Nat Rev Cancer. 2011;11(10):726–34. - PMC - PubMed
    1. Dawson MA, Kouzarides T. Cancer epigenetics: from mechanism to therapy. Cell. 2012;150(1):12–27. - PubMed
    1. Hendrich B, Bickmore W. Human diseases with underlying defects in chromatin structure and modification. Hum Mol Genet. 2001;10(20):2233–42. - PubMed

LinkOut - more resources