Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Oct 25:8:412.
doi: 10.1186/1471-2105-8-412.

Reproducibility of microarray data: a further analysis of microarray quality control (MAQC) data

Affiliations

Reproducibility of microarray data: a further analysis of microarray quality control (MAQC) data

James J Chen et al. BMC Bioinformatics. .

Abstract

Background: Many researchers are concerned with the comparability and reliability of microarray gene expression data. Recent completion of the MicroArray Quality Control (MAQC) project provides a unique opportunity to assess reproducibility across multiple sites and the comparability across multiple platforms. The MAQC analysis presented for the conclusion of inter- and intra-platform comparability/reproducibility of microarray gene expression measurements is inadequate. We evaluate the reproducibility/comparability of the MAQC data for 12901 common genes in four titration samples generated from five high-density one-color microarray platforms and the TaqMan technology. We discuss some of the problems with the use of correlation coefficient as metric to evaluate the inter- and intra-platform reproducibility and the percent of overlapping genes (POG) as a measure for evaluation of a gene selection procedure by MAQC.

Results: A total of 293 arrays were used in the intra- and inter-platform analysis. A hierarchical cluster analysis shows distinct differences in the measured intensities among the five platforms. A number of genes show a small fold-change in one platform and a large fold-change in another platform, even though the correlations between platforms are high. An analysis of variance shows thirty percent of gene expressions of the samples show inconsistent patterns across the five platforms. We illustrated that POG does not reflect the accuracy of a selected gene list. A non-overlapping gene can be truly differentially expressed with a stringent cut, and an overlapping gene can be non-differentially expressed with non-stringent cutoff. In addition, POG is an unusable selection criterion. POG can increase or decrease irregularly as cutoff changes; there is no criterion to determine a cutoff so that POG is optimized.

Conclusion: Using various statistical methods we demonstrate that there are differences in the intensities measured by different platforms and different sites within platform. Within each platform, the patterns of expression are generally consistent, but there is site-by-site variability. Evaluation of data analysis methods for use in regulatory decision should take no treatment effect into consideration, when there is no treatment effect, "a fold-change cutoff with a non-stringent p-value cutoff" could result in 100% false positive error selection.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Hierarchical clustering. Hierarchical clustering of 293 arrays from the five microarray platforms, four samples, and three sites, in most cases, five technical replicates for each sample. The five platforms are colored: Affymetrix (AFX), Applied Biosystems (ABI), Agilent Technologies (AG1), GE Healthcare (GEH), and Illumina (ILM). The four samples are colored: A, B, C, and D. The three sites are colored: site 1, site 2, and site 3. The correlation coefficient of the standardized intensity measurements over the 12091 genes were calculated for all pairwise combinations of the 293 arrays. The one-minus-correlation is used for the distance metric.
Figure 2
Figure 2
Matrix scatter plot of the logarithms of the fold-change estimates. B/A (upper triangle) and D/C (low triangle), for the five platforms. The diagonal line shown between lower left and upper right is for reference.
Figure 3
Figure 3
Scatter plots of the fold-change estimates between two platforms and the correlation coefficients. The fold change estimates in the plots was the larger of the A/B or B/A. The two lines represent a 2-fold change. The points in the lower right or upper left region have a 2-fold change in one platform and less than 2-fold change in the other platform. The six plots represent the 10 possible plots and include the largest and smallest correlations.
Figure 4
Figure 4
TaqMan and microarray platform comparability – correlation coefficients. (a) Boxplot of the correlation coefficients of TAQ v.s. microarray for sample A and sample B. (b) Boxplot of the fold-change(B/A) correlation coefficients of TAQ v.s. microarray.
Figure 5
Figure 5
TaqMan and microarray platform comparability. (a) Plots of the mean expressions of 3 sites (after standardized) of the five microarray platforms for Gene NM_000168. Using a Two-factors ANOVA model with interaction, the interaction term of this gene is not significant (p = 0.74). This indicates this gene has a good consistency of patterns of expression of four samples in all five platforms. (b) Plots of each of the mean expression of the five microarray platform versus Taqman for this gene. The interaction terms from the ANOVA for the data from the each platform and Taqman have the p-values of 10-17, 10-16, 10-7, 10-17, 10-12 for AFX, ABI, AG1, ILM, AND GEH, respectively. The IDs for this genes are 205201_at , 100093, A_23_P111531, GE57983, GI_13518031-S, Hs00609233_m1 for 5 AFX, ABI, AG1, ILM, GEH, TAQ, respectively.
Figure 6
Figure 6
Scatter plots of the fold-change estimates between two sites for the five platforms. The fold change estimates in the plots was the larger of the A/B or B/A. The two lines represent a 2-fold change. The points in the lower right or upper left region have a 2-fold change in one platform and less than 2-fold change in the other platform. The six plots represent the 15 possible plots and include the largest and smallest correlations.

Similar articles

Cited by

References

    1. Tan PK, Downey TJ, Spitznagel EL, Xu P, Fu D, Dimitrov DS, Lempicki RA, Raaka BM, Cam MC. Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res. 2003;31:5676–5684. doi: 10.1093/nar/gkg763. - DOI - PMC - PubMed
    1. Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S, Frank BC, Gabrielson E, Garcia JG, Geoghegan J, Germino G, Griffin C, Hilmer SC, Hoffman E, Jedlicka AE, Kawasaki E, Martinez-Murillo F, Morsberger L, Lee H, Petersen D, Quackenbush J, Scott A, Wilson M, Yang Y, Ye SQ, Yu W. Multiple – laboratory comparison of microarray platforms. Nat Methods. 2005;2:345–350. doi: 10.1038/nmeth756. - DOI - PubMed
    1. Larkin JE, Frank BC, Gavras H, Sultana R, Quackenbush J. Independence and reproducibility across microarray platforms. Nat Methods. 2005;2:337–344. doi: 10.1038/nmeth757. - DOI - PubMed
    1. Members of the Toxicogenomics Research Consortium Standardizing global gene expression analysis between laboratories and across platforms. Nat Methods. 2005;2:351–356. doi: 10.1038/nmeth754. - DOI - PubMed
    1. Dobbin KK, Beer DG, Meyerson M, Yeatman TJ, Gerald WL, Jacobson JW, Conley B, Buetow KH, Heiskanen M, Simon RM, Minna JD, Girard L, Misek DE, Taylor JM, Hanash S, Naoki K, Hayes DN, Ladd-Acosta C, Enkemann SA, Viale A, Giordano TJ. Interlaboratory comparability study of cancer gene expression analysis using oligonucleotide microarrays. Clin Cancer Res. 2005;11:565–572. - PubMed

Publication types

MeSH terms

LinkOut - more resources