Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2006 Dec 15:7:533.
doi: 10.1186/1471-2105-7-533.

Effect of various normalization methods on Applied Biosystems expression array system data

Affiliations
Comparative Study

Effect of various normalization methods on Applied Biosystems expression array system data

Catalin C Barbacioru et al. BMC Bioinformatics. .

Abstract

Background: DNA microarray technology provides a powerful tool for characterizing gene expression on a genome scale. While the technology has been widely used in discovery-based medical and basic biological research, its direct application in clinical practice and regulatory decision-making has been questioned. A few key issues, including the reproducibility, reliability, compatibility and standardization of microarray analysis and results, must be critically addressed before any routine usage of microarrays in clinical laboratory and regulated areas can occur. In this study we investigate some of these issues for the Applied Biosystems Human Genome Survey Microarrays.

Results: We analyzed the gene expression profiles of two samples: brain and universal human reference (UHR), a mixture of RNAs from 10 cancer cell lines, using the Applied Biosystems Human Genome Survey Microarrays. Five technical replicates in three different sites were performed on the same total RNA samples according to manufacturer's standard protocols. Five different methods, quantile, median, scale, VSN and cyclic loess were used to normalize AB microarray data within each site. 1,000 genes spanning a wide dynamic range in gene expression levels were selected for real-time PCR validation. Using the TaqMan assays data set as the reference set, the performance of the five normalization methods was evaluated focusing on the following criteria: (1) Sensitivity and reproducibility in detection of expression; (2) Fold change correlation with real-time PCR data; (3) Sensitivity and specificity in detection of differential expression; (4) Reproducibility of differentially expressed gene lists.

Conclusion: Our results showed a high level of concordance between these normalization methods. This is true, regardless of whether signal, detection, variation, fold change measurements and reproducibility were interrogated. Furthermore, we used TaqMan assays as a reference, to generate TPR and FDR plots for the various normalization methods across the assay range. Little impact is observed on the TP and FP rates in detection of differentially expressed genes. Additionally, little effect was observed by the various normalization methods on the statistical approaches analyzed which indicates a certain robustness of the analysis methods currently in use in the field, particularly when used in conjunction with the Applied Biosystems Gene Expression System.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Gene Selection: 1000 gene targets were selected for TaqMan® assay validation in order to span a wide dynamic range in expression level and fold changes. Scatter plots between two technical replicates for UHR (A) and Brain (B) samples were shown for the 29,098 genes represented on AB microarrays. The 1000 gene targets are represented in red, and show a wide dynamic range of expression levels and fold change.
Figure 2
Figure 2
Detection concordance: 803 genes for sample A, and 744 genes for sample B are detected as present (CT < 35) for at least two of the TaqMan® assay replicates. For each set of these genes, a sliding window containing 100 consecutive genes was constructed and moved one gene at a time to cover the whole range of Ct values. Within each sliding window, the percent of genes detected as present in at least half of the replicates of individual samples by AB microarray platform was computed and plotted as a function of mean CT value of the 100 genes in the given window.
Figure 3
Figure 3
Signal concordance: genes detected (present) by TaqMan® assays are used to represent the relationship between expression levels measured by AB microarrays and TaqMan® assays. The average log2(signal) of the 5 replicates from site 1 for all five normalization methods are plotted as functions of gene expression level measured by TaqMan® assays. Lines represent lowess smoothing fitting curves to the set of data points corresponding to one normalization method.
Figure 4
Figure 4
Reproducibility within sites: coefficients of variation are used to evaluate the impact of the 5 normalization methods on data reproducibility. (A) presents the CVs, of log2(signal), within site 1 for all 29,098 genes as a function of expression level measured by quantile normalization; (B) presents the CVs within site 1 for genes with TaqMan® assays as a function of TaqMan CT values. Lines represent lowess smoothing fitting curves of all data points from each normalization method.
Figure 5
Figure 5
Variability between sites: coefficients of variation are used to evaluate the impact of the 5 normalization methods on data reproducibility. One way (site) ANOVA is used to estimate variability within/between sites. CVs within sites (red dotted lines) and between sites (green solid lines) are plotted against quantile normalized data.
Figure 6
Figure 6
Fold Change Concordance: fold change between UHR and brain, determined by each normalization method applied to AB microarray data (y-axis) were plotted against those determined by TaqMan® Assays (x-axis). Genes were filtered based on real-time PCR detection thresholds (detectable in at least 3 out of 4 technical replicates in both samples). (A) linear regression lines (red solid lines) are presented in each plot. (B) lines represent lowess smoothing fitting curves to the 2550 data points (data from all three sites) of each normalization method.
Figure 7
Figure 7
Fold Change Compression: Genes were binned into low/medium/high according to TaqMan® assays CT measurements (the cut-offs are set to 24:29:35). Only genes having expression level in the same bin in both sample A and B are included. Boxplots of fold changes for each normalization method and TaqMan assays are presented.
Figure 8
Figure 8
Significantly differentially expressed genes concordance: genes detected in both samples by TaqMan® assays were first ranked according to their average CT value in UHR and brain. We use t-test to detect significantly differentially expressed genes, controlling FDR at 5% level. For each bin of 50 consecutive genes (according to the ranking), we compare the results from each normalization method with the ones from TaqMan® assays. We keep track of up/down regulation in each platform. TPR represent the percentage of genes detected differentially expressed in microarray data out of the ones detected by TaqMan assays. FDR was defined as FP/(TP + FP), where FP is false positive in microarray data, and represents the percentage of differentially expressed genes detected only by microarray out of all genes differentially expressed in microarray.
Figure 9
Figure 9
Differential expression t-test, t-test + FDR, t-test + FC, t-test + FDR + FC cut, SAM applied to Quantile normalization: we use different methods to detect significantly differentially expressed genes for different normalization methods: (1) t-test (p-value < 0.05), (2) t-test controlling FDR at 5% level, (3) t-test (p-value < 0.05) and FC < 1.5, (4) t-test controlling FDR at 5% level and FC < 1.5, or (5) SAM q < 0.05. We compare the results for data generated by site 1, from each normalization method, with the ones from TaqMan® assays for which differential expression is detected using t-test and controlling FDR at 5% level. We keep track of up/down regulation in each platform.
Figure 10
Figure 10
Reproducibility of differentially expressed gene lists: we use t-test controlling FDR at 5% level within each site to detect significantly differentially expressed genes for normalized data. We compare the results from each normalization method, across the 3 sites.

Similar articles

Cited by

References

    1. Hackett JL, Lesko LJ. Microarray data – the US FDA, industry and academia. Nat Biotechnol. 2003;21:742–743. doi: 10.1038/nbt0703-742. - DOI - PubMed
    1. Petricoin EF, 3rd, Hackett JL, Lesko LJ, Puri RK, Gutman SI, Chumakov K, Woodcock J, Feigal DW, Jr, Zoon KC, Sistare FD. Medical applications of microarray technologies: a regulatory science perspective. Nat Genet. 2002;32:474–479. doi: 10.1038/ng1029. - DOI - PubMed
    1. Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19:185–193. doi: 10.1093/bioinformatics/19.2.185. - DOI - PubMed
    1. Yang YH, Thorne NP. Normalization for two-color cDNA microarray data. In: Goldstein DR, editor. Science and Statistics: A Festschrift for Terry Speed, IMS Lecture Notes-Monograph Series. Vol. 40. 2003. pp. 403–418.
    1. Hartemink A, Gifford D, Jaakkola T, Young R. Maximum Likelihood Estimation of Optimal Scaling Factors for Expression Array Normalization. In: Bittner M, Chen Y, Dorsel A, Dougherty E, editor. Microarrays: Optical Technologies and Informatics, Proceedings of SPIE. Vol. 4266. 2001. pp. 132–140.

Publication types

MeSH terms

LinkOut - more resources