Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2004 Sep 2:5:61.
doi: 10.1186/1471-2164-5-61.

Performance evaluation of commercial short-oligonucleotide microarrays and the impact of noise in making cross-platform correlations

Affiliations
Comparative Study

Performance evaluation of commercial short-oligonucleotide microarrays and the impact of noise in making cross-platform correlations

Richard Shippy et al. BMC Genomics. .

Abstract

Background: Despite the widespread use of microarrays, much ambiguity regarding data analysis, interpretation and correlation of the different technologies exists. There is a considerable amount of interest in correlating results obtained between different microarray platforms. To date, only a few cross-platform evaluations have been published and unfortunately, no guidelines have been established on the best methods of making such correlations. To address this issue we conducted a thorough evaluation of two commercial microarray platforms to determine an appropriate methodology for making cross-platform correlations.

Results: In this study, expression measurements for 10,763 genes uniquely represented on Affymetrix U133A/B GeneChips and Amersham CodeLink UniSet Human 20 K microarrays were compared. For each microarray platform, five technical replicates, derived from the same total RNA samples, were labeled, hybridized, and quantified according to each manufacturers' standard protocols. The correlation coefficient (r) of differential expression ratios for the entire set of 10,763 overlapping genes was 0.62 between platforms. However, the correlation improved significantly (r = 0.79) when genes within noise were excluded. In addition to levels of inter-platform correlation, we evaluated precision, statistical-significance profiles, power, and noise levels for each microarray platform. Accuracy of differential expression was measured against real-time PCR for 25 genes and both platforms correlated well with r values of 0.92 and 0.79 for CodeLink and GeneChip, respectively.

Conclusions: As a result of this study, we recommend using only genes called 'present' in cross-platform correlations. However, as in this study, a large number of genes may be lost from the correlation due to differing levels of noise between platforms. This is an important consideration given the apparent difference in sensitivity of the two platforms. Data from microarray analysis need to be interpreted cautiously and therefore, we provide guidelines for making cross-platform correlations. In all, this study represents the most comprehensive and specifically designed comparison of short-oligonucleotide microarray platforms to date using the largest set of overlapping genes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Pair-wise array precision of CodeLink and GeneChip with illustration of respective noise levels. The representative scatter plots show precision of normalized expression values relative to noise. All 10,763 overlapping gene probes are represented in these plots. Values highlighted in red were concordantly 'absent' (noise) calls on both arrays compared. Orange lines show two-fold limits, while the black line represents equality.
Figure 2
Figure 2
Coefficients of variation for each platform as a function of intensity, across all replicates. Genes which are concordantly 'absent' are shown in red. The black line represents the 100-probe moving average.
Figure 3
Figure 3
Correlation of differential expression ratios between CodeLink and GeneChip. Pearson correlation coefficients (r) are shown for each comparison. (A.) When all 10,763 overlapping genes are compared between platforms the correlation is 0.62. (B.) All values for genes concordantly 'absent' were removed prior to making the cross-platform correlation. In this case, 3,362 genes are called 'present' on at least 1 of the 5 replicates across both tissues and platforms. (C.) 2,569 genes called 'present' on at least 3 of the 5 replicates across both tissues and platforms. The correlation improves further to 0.74. (D.) Genes called present on all 5 replicates across both tissues and platforms. For these 1,760 genes the correlation is 0.79.
Figure 4
Figure 4
'Volcano plots' for CodeLink and GeneChip. Each point represents a gene from the uniquely common set of 10,763 genes between platforms. Data points highlighted in blue represent genes which are concordantly 'present' in both tissues. The log10 ratio of expression (brain/pancreas) is shown on the x-axis and the p-value, from a two-tailed Student's t-tests on normalized log-transformed intensities, is shown on the y-axis. The vertical dashed lines represent 2-fold change ratios and the horizontal dashed line represents the statistical-significance level where p = 0.01.
Figure 5
Figure 5
Venn diagrams of differential expression calls and statistical significance across both microarray platforms. A two-sample two-tailed t-test on normalized log-transformed intensities was performed for each microarray platform. (A) The entire set of 10,763 uniquely common genes between platforms was used to determine the number of statistically significant (p < 0.01) expression ratios. Genes above and below noise were included in the analysis. (B) Statistical significance determined from the set of 2,569 genes which are 'present' on at least 3 arrays in both tissues. Expression values below noise ('absent') were not included in the analysis. (C) Statistical significance determined from the set of 1,760 genes which are 'present' on all 5 arrays in both tissues (i.e. concordantly 'present').
Figure 6
Figure 6
Power analysis estimating the number of technical array replicates needed to achieve a reasonable level of statistical power or confidence for CodeLink (blue) and GeneChip (red) when noise was included (solid diamonds) or excluded (open diamonds). For both graphs the alpha was set at 0.01. (A) Relationship between power and arrays necessary to statistically discriminate two-fold changes in expression. To achieve a power of 0.90 using all 10,763 genes, 3 arrays are minimally necessary for CodeLink while 8 are required for GeneChip. However, when noise is excluded, both GeneChip and CodeLink require only 1 array to achieve this same level of power. In fact, when noise is excluded, 1 array for both GeneChip and CodeLink has a power of 0.99 to detect two-fold changes in expression. B.) In order to detect 1.5 fold changes in expression, at a 0.90 power, when noise is excluded, CodeLink minimally requires 2 arrays while GeneChip requires 3.
Figure 7
Figure 7
Accuracy of CodeLink and GeneChip differential-expression ratios relative to qrtPCR. Expression ratios for each microarray platform were measured against results from qrtPCR for a randomly selected subset of 25 genes. Pearson correlation coefficients (r) are shown for each comparison.

References

    1. Sendera TJ, Dorris D, Ramakrishnan R, Nguyen A, Trakas D, Mazumder A. Expression Profiling with Oligonucleotide Arrays: Technologies and Applications for Neurobiology. Neurochem Res. 2002;10:1005–1026. doi: 10.1023/A:1020948603490. - DOI - PubMed
    1. Brazma A, Hingamp P, Quackenbush J, Sherlcok G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FCP, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nature Genetics. 2001;29:365–371. doi: 10.1038/ng1201-365. - DOI - PubMed
    1. Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30:207–210. doi: 10.1093/nar/30.1.207. - DOI - PMC - PubMed
    1. Kuo WP, Jenssen TK, Butte AJ, Ohno-Machado L, Kohane IS. Analysis of matched mRNA measurements from two different microarray technologies. Bioinformatics. 2002;18:405–412. doi: 10.1093/bioinformatics/18.3.405. - DOI - PubMed
    1. Li J, Pankratz M, Johnson JA. Differential gene expression patterns revealed by oligonucleotide versus long cDNA arrays. Toxicol Sci. 2002;69:383–390. doi: 10.1093/toxsci/69.2.383. - DOI - PubMed

Publication types