Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007;35(2):e8.
doi: 10.1093/nar/gkl871. Epub 2006 Dec 7.

Positional artifacts in microarrays: experimental verification and construction of COP, an automated detection tool

Affiliations

Positional artifacts in microarrays: experimental verification and construction of COP, an automated detection tool

Haiyuan Yu et al. Nucleic Acids Res. 2007.

Abstract

Microarray technology is currently one of the most widely-used technologies in biology. Many studies focus on inferring the function of an unknown gene from its co-expressed genes. Here, we are able to show that there are two types of positional artifacts in microarray data introducing spurious correlations between genes. First, we find that genes that are close on the microarray chips tend to have higher correlations between their expression profiles. We call this the 'chip artifact'. Our calculations suggest that the carry-over during the printing process is one of the major sources of this type of artifact, which is later confirmed by our experiments. Based on our experiments, the measured intensity of a microarray spot contains 0.1% (for fully-hybridized spots) to 93% (for un-hybridized ones) of noise resulting from this artifact. Secondly, we, for the first time, show that genes that are close on the microtiter plates in microarray experiments also tend to have higher correlations. We call this the 'plate artifact'. Both types of artifacts exist with different severity in all cDNA microarray experiments that we analyzed. Therefore, we develop an automated web tool-COP (COrrelations by Positional artifacts) to detect these artifacts in microarray experiments. COP has been integrated with the microarray data normalization tool, ExpressYourself, which is available at http://bioinfo.mbb.yale.edu/ExpressYourself/. Together, the two can eliminate most of the common noises in microarray data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustration of experimental procedures to produce a microarray chip (AC). Each color represents one printing tip. The spots in the same color are all transferred by the same tip of the corresponding color.
Figure 2
Figure 2
(A) Average correlation coefficient distribution as a function of the distance of gene pairs on the chip. All gene pairs on the chip are included for this analysis, except those that are close on the chromosome. (B) Average correlation coefficient distribution as a function of the distance of gene pairs on the chip in X- and Y-directions. P-values calculated by t-tests measure the statistical difference between the correlations of genes that are within the same printing block in X- and Y-directions. Please note that only gene pairs that are printed by the same tip (i.e. within the same printing block) are included for this analysis. The distance between two genes is measured in terms of the number of spots on the chip, i.e. the number of printed spots separating the two. All gene pairs that are close on the chromosome (within 10 ORFs) were excluded from the analysis. Error bars in all figures represent the standard errors of the data.
Figure 3
Figure 3
Illustration of the experimental design to uncover the role of carry-over in producing the chip artifact in microarray experiments. (A) Producing the test chip: All B's are printed first without probe A carry-over. (B) Producing the control chip: Probes A and B are printed alternatively onto the chip. The numbers in both (A) and (B) indicate the order in which each spot is printed to the chip. (C) Comparison of the intensities of B's with and without carry-over. All intensities were normalized against the test chip (see Methods and Materials section). P-value is calculated using the t-test.
Figure 4
Figure 4
Illustration of the experimental design to uncover the role of other possible sources in producing the chip artifact in microarray experiments. (A) Producing microarray chips with different layouts. All B spots were printed first. (B) Comparison of the intensities of B's with and without possible contamination. All intensities were normalized against the chip with layout I (see Methods and Materials section).
Figure 5
Figure 5
Illustration of the experimental design to determine the effective distance of the chip artifact. (A) All spots are printed based on their chip order. (B) Comparison of the intensities of B's printed after the A spot. All intensities were normalized against the test chip in Figure 3A (see Methods and Materials section). The P-value of the regression is calculated by the significance test for linear regression. The P-value measuring the intensity difference between B7 spots and un-contaminated spots is calculated by the t-test.
Figure 6
Figure 6
Average correlation coefficient distribution as a function of the distance of gene pairs on the chip. All three curves show striking periodicities, corresponding to the size of the printing block in the three experiments. All gene pairs that are close on the chromosome (within 10 ORFs) were excluded from the analysis.
Figure 7
Figure 7
(A) Pair correlation function for Spellman-alpha-factor arrested cell cycle dataset (red), Spellman-cdc15 arrested cell cycle dataset (black), and Zhu-alpha factor blocked cell cycle dataset (light blue), the inset highlights the staggered characteristics. X-axis represents the distance between gene pairs. Y-axis represents the percentage of highly correlated pairs that have a given distance. (B) Power spectrum of the pair correlation function of co-expressed gene pairs determined by the Fourier transformation. Two common frequencies are indicated by the arrows. Please note that the distributions are manually shifted 20 U along the Y-axis to separate them from each other to clearly show the peaks. (C) Chip distance map and (D) Expression correlation coefficient map. Both maps are produced using Spellman-alpha-factor arrested cell cycle dataset, whose x- and y-axis represent the first 100 ORFs on chromosome IV. In the distance map, the color on each spot represents the distance between the gene on x-axis and the gene on y-axis. In the expression correlation coefficient map, the color represents the correlation coefficient between the gene pair.
Figure 8
Figure 8
Screen shot of COP within ExpressYourself. The dataset used here is the human colon cancer dataset.

Similar articles

Cited by

References

    1. Schena M., Shalon D., Davis R.W., Brown P.O. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995;270:467–470. - PubMed
    1. Shalon D., Smith S.J., Brown P.O. A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Genome Res. 1996;6:639–645. - PubMed
    1. Brown P.O., Botstein D. Exploring the new world of the genome with DNA micrarrays. Nature Genet. 1999;21:33–37. - PubMed
    1. Eisen M.B., Spellman P.T., Brown P.O., Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci. USA. 1998;95:14863–14868. - PMC - PubMed
    1. Altman R.B., Raychaudhuri S. Whole-genome expression analysis: challenges beyond clustering. Curr. Opin. Struct. Biol. 2001;11:340–347. - PubMed

Publication types