Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jan 25:13:42.
doi: 10.1186/1471-2164-13-42.

An evaluation of two-channel ChIP-on-chip and DNA methylation microarray normalization strategies

Affiliations

An evaluation of two-channel ChIP-on-chip and DNA methylation microarray normalization strategies

Michiel E Adriaens et al. BMC Genomics. .

Abstract

Background: The combination of chromatin immunoprecipitation with two-channel microarray technology enables genome-wide mapping of binding sites of DNA-interacting proteins (ChIP-on-chip) or sites with methylated CpG di-nucleotides (DNA methylation microarray). These powerful tools are the gateway to understanding gene transcription regulation. Since the goals of such studies, the sample preparation procedures, the microarray content and study design are all different from transcriptomics microarrays, the data pre-processing strategies traditionally applied to transcriptomics microarrays may not be appropriate. Particularly, the main challenge of the normalization of "regulation microarrays" is (i) to make the data of individual microarrays quantitatively comparable and (ii) to keep the signals of the enriched probes, representing DNA sequences from the precipitate, as distinguishable as possible from the signals of the un-enriched probes, representing DNA sequences largely absent from the precipitate.

Results: We compare several widely used normalization approaches (VSN, LOWESS, quantile, T-quantile, Tukey's biweight scaling, Peng's method) applied to a selection of regulation microarray datasets, ranging from DNA methylation to transcription factor binding and histone modification studies. Through comparison of the data distributions of control probes and gene promoter probes before and after normalization, and assessment of the power to identify known enriched genomic regions after normalization, we demonstrate that there are clear differences in performance between normalization procedures.

Conclusion: T-quantile normalization applied separately on the channels and Tukey's biweight scaling outperform other methods in terms of the conservation of enriched and un-enriched signal separation, as well as in identification of genomic regions known to be enriched. T-quantile normalization is preferable as it additionally improves comparability between microarrays. In contrast, popular normalization approaches like quantile, LOWESS, Peng's method and VSN normalization alter the data distributions of regulation microarrays to such an extent that using these approaches will impact the reliability of the downstream analysis substantially.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The birth of an enrichment signal around a binding site (ChIP-on-chip). Since DNA fragmentation through sonication can be modeled as a Poisson process [1], the DNA fragment length distribution follows a Poisson distribution and adjacent probes on the genome have a correlated log-ratio, resulting in the hybridization pattern shown here. Each blue column represents a probe hybridization site. Black-outlined bars represent their log-ratio. Green lines are sonicated immuno-precipitated DNA fragments corresponding to the binding site.
Figure 2
Figure 2
An example of a two-component distribution fitted on ChIP-on-chip data of dataset #1 (see Methods section for dataset description and numbering).
Figure 3
Figure 3
Density distributions of the control probes and gene promoter probes of the raw log-ratio data of all individual microarrays and corresponding ROC curves for dataset #1 (a), dataset #2 (b), dataset #3 (c), dataset #4 (d) and dataset #5 (e). AUC values of each ROC curve are reported in the legend.
Figure 4
Figure 4
ROC curves of the control probe and gene promoter distributions of the combined log-ratio data, for each normalization approach of dataset #1 (a), dataset #2 (b), dataset #3 (c), dataset #4 (d) and dataset #5 (e). AUC values are reported in the legend. TBW = Tukey's biweight scaling, Q = quantile normalization, TQ = T-quantile normalization.
Figure 5
Figure 5
Density distributions of the control probes and gene promoter probes of the normalized combined log-ratio data of dataset #2 (ChIP-on-chip). Results are shows for (from left to right, top to bottom) VSN, LOWESS, quantile (Q), T-quantile (TQ), Tukey's biweight scaling (TBW), Peng's method.
Figure 6
Figure 6
Density distributions of the control probes and gene promoter probes of the normalized log-ratio data of each individual microarray and corresponding ROC curves of dataset #2 (ChIP-on-chip). Top: Results for T-quantile (TQ) normalized data. Bottom: Results for Tukey's biweight scaling (TBW) normalized data. AUC values of each ROC curve are reported in the legend.
Figure 7
Figure 7
Density distributions for the control probes and gene promoter probes of the normalized log-ratio data of each individual microarray and corresponding ROC curves of dataset #4 and #5. Top left: Results for T-quantile (TQ) normalized data of dataset #4. Top right: Results for Tukey's biweight scaling (TBW) normalized data of dataset #4. Bottom left: Results for T-quantile (TQ) normalized data of dataset #5. Bottom right: Results for Tukey's biweight scaling (TBW) normalized data of dataset #5. AUC values of each ROC curve are reported in the legend.
Figure 8
Figure 8
Genome plots of negative 10log-transformed enrichment p-values, for the HOXA cluster on human chromosome 7 (top) and the Dlk1-Gtl2 cluster on mouse chromosome 12 (bottom). Red vertical lines are given at values corresponding to p-values of 0.05 (top line) and 0.20 (bottom line). Regions with values above the top line are highly enriched, while values between the lines are a sign of moderate enrichment. The total number of identified enriched regions are reported in the legend. TBW = Tukey's biweight scaling, Q = quantile normalization, TQ = T-quantile normalization.

Similar articles

Cited by

References

    1. Zheng M, Barrera LO, Ren B, Wu YN. ChIP-chip: Data, Model, and Analysis. Biometrics. 2007;63(3):787–796. doi: 10.1111/j.1541-0420.2007.00768.x. - DOI - PubMed
    1. Mohn F, Weber M, Schübeler D, Roloff T-C. Methylated DNA Immunoprecipitation (MeDIP) Methods Mol Biol. 2009;507:55–64. doi: 10.1007/978-1-59745-522-0_5. - DOI - PubMed
    1. Ordway JM, Bedell JA, Citek RW, Nunberg A, Garrido A, Kendall R, Stevens JR, Cao D, Doerge RW, Korshunova Y. et al.Comprehensive DNA methylation profiling in a human cancer genome identifies novel epigenetic targets. Carcinogenesis. 2006;27(12):2409–2423. doi: 10.1093/carcin/bgl161. - DOI - PubMed
    1. Ballestar E, Paz MF, Valle L, Wei S, Fraga MF, Espada J, Cigudosa JC, Huang TH-M, Esteller M. Methyl-CpG binding proteins identify novel sites of epigenetic inactivation in human cancer. The EMBO journal. 2003;22(23):6335–6345. doi: 10.1093/emboj/cdg604. - DOI - PMC - PubMed
    1. Esteller M. Cancer epigenomics: DNA methylomes and histone-modification maps. Nat Rev Genet. 2007;8(4):286–298. doi: 10.1038/nrg2005. - DOI - PubMed

Publication types

LinkOut - more resources