. 2015 May 9:16:150.

doi: 10.1186/s12859-015-0579-z.

Is this the right normalization? A diagnostic tool for ChIP-seq normalization

Claudia Angelini¹, Ruth Heller², Rita Volkinshtein³, Daniel Yekutieli⁴

Affiliations

¹ Istituto per le Applicazioni del Calcolo "Mauro Picone", Via Pietro Castellino, 111, Naples, 80131, Italy. c.angelini@iac.cnr.it.
² Department of Statistics and Operations Research Tel Aviv University, Ramat Aviv, Tel Aviv, 69978, Israel. ruheller@post.tau.ac.il.
³ Department of Statistics and Operations Research Tel Aviv University, Ramat Aviv, Tel Aviv, 69978, Israel. spirita123@gmail.com.
⁴ Department of Statistics and Operations Research Tel Aviv University, Ramat Aviv, Tel Aviv, 69978, Israel. yekutiel@post.tau.ac.il.

PMID: 25957089
PMCID: PMC4448883
DOI: 10.1186/s12859-015-0579-z

Is this the right normalization? A diagnostic tool for ChIP-seq normalization

Claudia Angelini et al. BMC Bioinformatics. 2015.

. 2015 May 9:16:150.

doi: 10.1186/s12859-015-0579-z.

Authors

Claudia Angelini¹, Ruth Heller², Rita Volkinshtein³, Daniel Yekutieli⁴

Affiliations

¹ Istituto per le Applicazioni del Calcolo "Mauro Picone", Via Pietro Castellino, 111, Naples, 80131, Italy. c.angelini@iac.cnr.it.
² Department of Statistics and Operations Research Tel Aviv University, Ramat Aviv, Tel Aviv, 69978, Israel. ruheller@post.tau.ac.il.
³ Department of Statistics and Operations Research Tel Aviv University, Ramat Aviv, Tel Aviv, 69978, Israel. spirita123@gmail.com.
⁴ Department of Statistics and Operations Research Tel Aviv University, Ramat Aviv, Tel Aviv, 69978, Israel. yekutiel@post.tau.ac.il.

PMID: 25957089
PMCID: PMC4448883
DOI: 10.1186/s12859-015-0579-z

Abstract

Background: Chip-seq experiments are becoming a standard approach for genome-wide profiling protein-DNA interactions, such as detecting transcription factor binding sites, histone modification marks and RNA Polymerase II occupancy. However, when comparing a ChIP sample versus a control sample, such as Input DNA, normalization procedures have to be applied in order to remove experimental source of biases. Despite the substantial impact that the choice of the normalization method can have on the results of a ChIP-seq data analysis, their assessment is not fully explored in the literature. In particular, there are no diagnostic tools that show whether the applied normalization is indeed appropriate for the data being analyzed.

Results: In this work we propose a novel diagnostic tool to examine the appropriateness of the estimated normalization procedure. By plotting the empirical densities of log relative risks in bins of equal read count, along with the estimated normalization constant, after logarithmic transformation, the researcher is able to assess the appropriateness of the estimated normalization constant. We use the diagnostic plot to evaluate the appropriateness of the estimates obtained by CisGenome, NCIS and CCAT on several real data examples. Moreover, we show the impact that the choice of the normalization constant can have on standard tools for peak calling such as MACS or SICER. Finally, we propose a novel procedure for controlling the FDR using sample swapping. This procedure makes use of the estimated normalization constant in order to gain power over the naive choice of constant (used in MACS and SICER), which is the ratio of the total number of reads in the ChIP and Input samples.

Conclusions: Linear normalization approaches aim to estimate a scale factor, r, to adjust for different sequencing depths when comparing ChIP versus Input samples. The estimated scaling factor can easily be incorporated in many peak caller algorithms to improve the accuracy of the peak identification. The diagnostic plot proposed in this paper can be used to assess how adequate ChIP/Input normalization constants are, and thus it allows the user to choose the most adequate estimate for the analysis.

PubMed Disclaimer

Figures

**Figure 1**
Diagnostic plots for mouse data. Diagnostic plots for six datasets, representing three different modifications, from the mouse embryonic fibroblast cells in the study of [38]. Panel **(a)** refers to H3K4me3, panel **(b)** to H3K27me3, panels **(c-e)** to the three replicates of H3K36me3, finally panel **(f)** to the pooled version of H3K36me3. The five densities are: the density of $log \frac{Ñ_{ch} (i)}{Ñ_{in} (i)}$ in all bins (solid black curve), the density of the subset of bins in last quartile in length (two-dashed pink), the density of the subset of bins in third quartile in length (dashed blue), the density of the subset of bins in second quartile in length (dot-dashed green), and the density of the subset of bins in first quartile in length (dotted red). The vertical lines show the estimated logr using CisGenome (brown line), CCAT (deepink line) and NCIS (navy line). The plot was produced with K=200.

**Figure 2**
Distributions of estimated log relative risks. The empirical density of the log relative risks for background read counts that are distributed as Poisson (solid black); or as over-dispersed Poisson with p=0.5 (dot-dashed pink) and with p=0.25 (dash blue). The solid vertical line is l o g(r), with r=0.7. The peak is around logr for all three densities.

**Figure 3**
Diagnostic plots for simulated data. Diagnostic plots for four simulated datasets, generated from the control sample of the ChIP-seq study by [37]. Figures **(a)** and **(b)** are the results from the Read-add simulation, with down-sampling by 2 and 50, respectively. Figures **(c)** and **(d)** are the results from the By-Genes simulation, with down-sampling by 2 and 20. The five densities are: the density of $log \frac{Ñ_{ch} (i)}{Ñ_{in} (i)}$ in all bins (solid black curve), the density of the subset of bins in last quartile in length (two-dashed pink), the density of the subset of bins in third quartile in length (dashed blue), the density of the subset of bins in second quartile in length (dot-dashed green), and the density of the subset of bins in first quartile in length (dotted red). The vertical lines show the estimated logr using CisGenome (brown line), CCAT (deepink line) and NCIS (navy line), as well as the true normalization factor in gray. The plot was produced with K=500.

**Figure 4**
Diagnostic plots for modENCODE data. Diagnostic plots for the three datasets from modENCODE. Datasets refer to H3K27me3 modification in *D melanogaster*. Panel **(a)** refers to ChIP id. 1820 and Input id. 1815, panel **(b)** to ChIP id 1957 and Input id 1961, panel **(c)** to the pooled version of the modEncode semples. The five densities are: the density of $log \frac{Ñ_{ch} (i)}{Ñ_{in} (i)}$ in all bins (solid black curve), the density of the subset of bins in last quartile in length (two-dashed pink), the density of the subset of bins in third quartile in length (dashed blue), the density of the subset of bins in second quartile in length (dot-dashed green), and the density of the subset of bins in first quartile in length (dotted red). The vertical lines show the estimated logr using CisGenome (brown line), CCAT (deepink line) and NCIS (navy line). The plot was produced with K=200.

See this image and copyright information in PMC

Cited by

metagene Profiles Analyses Reveal Regulatory Element's Factor-Specific Recruitment Patterns.
Joly Beauparlant C, Lamaze FC, Deschênes A, Samb R, Lemaçon A, Belleau P, Bilodeau S, Droit A. Joly Beauparlant C, et al. PLoS Comput Biol. 2016 Aug 18;12(8):e1004751. doi: 10.1371/journal.pcbi.1004751. eCollection 2016 Aug. PLoS Comput Biol. 2016. PMID: 27538250 Free PMC article.
The ENCODE Imputation Challenge: a critical assessment of methods for cross-cell type imputation of epigenomic profiles.
Schreiber JM, Boix CA, Wook Lee J, Li H, Guan Y, Chang CC, Chang JC, Hawkins-Hooker A, Schölkopf B, Schweikert G, Carulla MR, Canakoglu A, Guzzo F, Nanni L, Masseroli M, Carman MJ, Pinoli P, Hong C, Yip KY, Spence JP, Batra SS, Song YS, Mahony S, Zhang Z, Tan W, Shen Y, Sun Y, Shi M, Adrian J, Sandstrom RS, Farrell NP, Halow JM, Lee K, Jiang L, Yang X, Epstein CB, Strattan JS, Bernstein BE, Snyder MP, Kellis M, Noble WS, Kundaje AB; ENCODE Imputation Challenge Participants. Schreiber JM, et al. Genome Biol. 2023 Apr 18;24(1):79. doi: 10.1186/s13059-023-02915-y. Genome Biol. 2023. PMID: 37072822 Free PMC article.
NucTools: analysis of chromatin feature occupancy profiles from high-throughput sequencing data.
Vainshtein Y, Rippe K, Teif VB. Vainshtein Y, et al. BMC Genomics. 2017 Feb 14;18(1):158. doi: 10.1186/s12864-017-3580-2. BMC Genomics. 2017. PMID: 28196481 Free PMC article.
Quantitative analysis of ChIP-seq data uncovers dynamic and sustained H3K4me3 and H3K27me3 modulation in cancer cells under hypoxia.
Adriaens ME, Prickaerts P, Chan-Seng-Yue M, van den Beucken T, Dahlmans VEH, Eijssen LM, Beck T, Wouters BG, Voncken JW, Evelo CTA. Adriaens ME, et al. Epigenetics Chromatin. 2016 Nov 1;9:48. doi: 10.1186/s13072-016-0090-4. eCollection 2016. Epigenetics Chromatin. 2016. PMID: 27822313 Free PMC article.
T3E: a tool for characterising the epigenetic profile of transposable elements using ChIP-seq data.
Almeida da Paz M, Taher L. Almeida da Paz M, et al. Mob DNA. 2022 Nov 30;13(1):29. doi: 10.1186/s13100-022-00285-z. Mob DNA. 2022. PMID: 36451223 Free PMC article.

See all "Cited by" articles

References

1. Espada J, Esteller M. Epigenetic control of nuclear architecture. Cell Mol Life Sci. 2007;64:449–57. doi: 10.1007/s00018-007-6358-x. - DOI - PMC - PubMed
1. Portela A, Esteller M. Epigenetic modifications and human disease. Nat Biotech. 2010;28:1057–68. doi: 10.1038/nbt.1685. - DOI - PubMed
1. Martens J, Stunnenberg H, Logie C. The decade of the epigenomes? Genes Cancer. 2011;6:680–7. doi: 10.1177/1947601911417860. - DOI - PMC - PubMed
1. Barski A, Cuddapah S, Cui K, Roh T, Schones D, Wang Z, et al. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129:823–37. doi: 10.1016/j.cell.2007.05.009. - DOI - PubMed
1. Johnson D, Mortazavi A, Myers R, Wald B. Genome-wide mapping of in vivo protein-DNA interactions. Science. 2007;316:1497–502. doi: 10.1126/science.1141319. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Is this the right normalization? A diagnostic tool for ChIP-seq normalization

Affiliations

Is this the right normalization? A diagnostic tool for ChIP-seq normalization

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous