Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jun 15;31(12):1889-96.
doi: 10.1093/bioinformatics/btv094. Epub 2015 Feb 13.

A novel statistical method for quantitative comparison of multiple ChIP-seq datasets

Affiliations

A novel statistical method for quantitative comparison of multiple ChIP-seq datasets

Li Chen et al. Bioinformatics. .

Abstract

Motivation: ChIP-seq is a powerful technology to measure the protein binding or histone modification strength in the whole genome scale. Although there are a number of methods available for single ChIP-seq data analysis (e.g. 'peak detection'), rigorous statistical method for quantitative comparison of multiple ChIP-seq datasets with the considerations of data from control experiment, signal to noise ratios, biological variations and multiple-factor experimental designs is under-developed.

Results: In this work, we develop a statistical method to perform quantitative comparison of multiple ChIP-seq datasets and detect genomic regions showing differential protein binding or histone modification. We first detect peaks from all datasets and then union them to form a single set of candidate regions. The read counts from IP experiment at the candidate regions are assumed to follow Poisson distribution. The underlying Poisson rates are modeled as an experiment-specific function of artifacts and biological signals. We then obtain the estimated biological signals and compare them through the hypothesis testing procedure in a linear model framework. Simulations and real data analyses demonstrate that the proposed method provides more accurate and robust results compared with existing ones.

Availability and implementation: An R software package ChIPComp is freely available at http://web1.sph.emory.edu/users/hwu30/software/ChIPComp.html.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Scatterplots of IP counts versus estimated background signals from the peak regions, in logarithm scale. The red dashed line is the result from cubic smoothing spline fitting (Color version of this figure is available at Bioinformatics online.)
Fig. 2.
Fig. 2.
Comparison of differential peak detection accuracies from simulations. The proportions of true discovery among top-ranked candidate regions is plotted against the number of top-ranked regions. (a) and (b) data are generated based on the proposed model. (c) and (d) data are generated based on the additive model. (a) and (c) are based on H3K27ac. (b) and (d) are based on PolII (Color version of this figure is available at Bioinformatics online.)
Fig. 3.
Fig. 3.
Histogram of P-values reported from different methods, based on null model that there’s no differential regions. The data are generated from the proposed model
Fig. 4.
Fig. 4.
Comparison of FDR estimations from different methods, based on simulation. X-axis shows the FDR reported from different methods, and y-axis shows the observed FDR (Color version of this figure is available at Bioinformatics online.)
Fig. 5.
Fig. 5.
Comparison of differential peak accuracies from real datasets. All results are for two-condition comparisons on different histone modification or protein binding, as marked in figure titles (Color version of this figure is available at Bioinformatics online.)

Similar articles

Cited by

References

    1. Benjamini Y., Hochberg Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodological), 57, 289–300.
    1. Celniker S.E., et al. (2009) Unlocking the secrets of the genome. Nature , 459, 927–930. - PMC - PubMed
    1. Chen X., et al. (2008) Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell , 133, 1106–1117. - PubMed
    1. Efron B. (2004) Large-scale simultaneous hypothesis testing. J. Am. Stat. Assoc. , 99, 96–104.
    1. Feng H., Conneely K.N., Wu H. (2014) A bayesian hierarchical model to detect differentially methylated loci from single nucleotide resolution sequencing data. Nucleic Acids Res. , 42, e69. - PMC - PubMed

Publication types