Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 May 15;26(10):1308-15.
doi: 10.1093/bioinformatics/btq118. Epub 2010 Mar 31.

Supervised normalization of microarrays

Affiliations

Supervised normalization of microarrays

Brigham H Mecham et al. Bioinformatics. .

Abstract

Motivation: A major challenge in utilizing microarray technologies to measure nucleic acid abundances is 'normalization', the goal of which is to separate biologically meaningful signal from other confounding sources of signal, often due to unavoidable technical factors. It is intuitively clear that true biological signal and confounding factors need to be simultaneously considered when performing normalization. However, the most popular normalization approaches do not utilize what is known about the study, both in terms of the biological variables of interest and the known technical factors in the study, such as batch or array processing date.

Results: We show here that failing to include all study-specific biological and technical variables when performing normalization leads to biased downstream analyses. We propose a general normalization framework that fits a study-specific model employing every known variable that is relevant to the expression study. The proposed method is generally applicable to the full range of existing probe designs, as well as to both single-channel and dual-channel arrays. We show through real and simulated examples that the method has favorable operating characteristics in comparison to some of the most highly used normalization methods.

Availability: An R package called snm implementing the methodology will be made available from Bioconductor (http://bioconductor.org).

Contact: jstorey@princeton.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
A demonstration of the main ideas behind supervised normalization of microarrays. (A) A hypothetical example to demonstrate the differences between supervised and unsupervised normalization strategies. The three boxes arranged across the top display different types of potential effects. Each of these potentially influences the unnormalized observed intensities, which are presented as densities in the middle panel. The blue and red lines describe the different biological conditions, while the dashed and dotted lines describe the different dates. The differences among the four arise either from the biology or study design. After normalization with a supervised approach that takes all three effects into account when normalizing the data, the differences between the blue and red lines are still present, while the differences between the dashed and dotted lines have been removed. However, for unsupervised approaches, such as quantile normalization, the resulting data have been transformed so that all arrays have the same distribution, a result that clearly violates the biological relationship of interest. (B) An example of the model we fit to the probe-level data from a microarray study. The model has probe-specific terms, intensity-dependent terms and may include other terms such as probe composition effects or surface level spatial effects.
Fig. 2.
Fig. 2.
Results from simulated data with differential expression and array effects. The true proportion of null probes is π0 = 0.70. (A) P-value histogram of null probes after SNM normalization. (B) P-value histogram of all probes after SNM normalization. (C) P-value histogram of null probes after QN. (D) P-value histogram of all probes after QN. (E) P-value histogram of null probes after ISN. (F) P-value histogram of all probes after ISN.
Fig. 3.
Fig. 3.
Summary of null P-values from simulated data with differential expression, batch and array effects. The true proportion of null probes is π0 = 0.70. (A) P-value histogram of null probes after the SNM normalization. (B) P-value histogram of null probes after QN. (C) P-value histogram of null probes after QN using a model that includes a term for the batch effects.
Fig. 4.
Fig. 4.
Results from Vascular Development Study obtained from QN and SNM. The relationship between samples after normalization are presented as a clustering dendogram. The labels for each node denote the corresponding age of the sample hybridized to that array, and the colored boxes indicate the batch. Note that the SNM results correctly position biological replicate samples on adjacent nodes (A), and predicts a robust effect of age on gene expression [formula image = 0.51 (C)]. Conversely, the first bifurcation in the QN data separates the data by the batch (B) and these data suggest there is no effect of age on gene expression [formula image = 1 (D)].

References

    1. Baird D, et al. Normalization of microarray data using a spatial mixed model analysis which includes splines. Bioinformatics. 2004;20:3196. - PubMed
    1. Bolstad B, et al. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19:185–193. - PubMed
    1. Dabney A, Storey J. Normalization of two-channel microarrays accounting for experimental design and intensity-dependent relationships. Genome Biol. 2007;8:R44. - PMC - PubMed
    1. Dudoit S, et al. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat. Sin. 2002;12:111–140.
    1. Irizarry R, et al. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 2003;31:e15. - PMC - PubMed

Publication types

MeSH terms