Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 May 7:12:137.
doi: 10.1186/1471-2105-12-137.

Assessing affymetrix GeneChip microarray quality

Affiliations

Assessing affymetrix GeneChip microarray quality

Matthew N McCall et al. BMC Bioinformatics. .

Abstract

Background: Microarray technology has become a widely used tool in the biological sciences. Over the past decade, the number of users has grown exponentially, and with the number of applications and secondary data analyses rapidly increasing, we expect this rate to continue. Various initiatives such as the External RNA Control Consortium (ERCC) and the MicroArray Quality Control (MAQC) project have explored ways to provide standards for the technology. For microarrays to become generally accepted as a reliable technology, statistical methods for assessing quality will be an indispensable component; however, there remains a lack of consensus in both defining and measuring microarray quality.

Results: We begin by providing a precise definition of microarray quality and reviewing existing Affymetrix GeneChip quality metrics in light of this definition. We show that the best-performing metrics require multiple arrays to be assessed simultaneously. While such multi-array quality metrics are adequate for bench science, as microarrays begin to be used in clinical settings, single-array quality metrics will be indispensable. To this end, we define a single-array version of one of the best multi-array quality metrics and show that this metric performs as well as the best multi-array metrics. We then use this new quality metric to assess the quality of microarry data available via the Gene Expression Omnibus (GEO) using more than 22,000 Affymetrix HGU133a and HGU133plus2 arrays from 809 studies.

Conclusions: We find that approximately 10 percent of these publicly available arrays are of poor quality. Moreover, the quality of microarray measurements varies greatly from hybridization to hybridization, study to study, and lab to lab, with some experiments producing unusable data. Many of the concepts described here are applicable to other high-throughput technologies.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Area under the ROC curve is increased by removing a poor quality array. ROC curves for detection of the 16 spiked-in transcripts using all 8 arrays (black line) and with each array removed (gray lines). The red ROC curve corresponds to removing array 4. Removing array 4 results in the largest increase in the area under the ROC curve suggesting that array 4 may be of poor quality.
Figure 2
Figure 2
Distances between replicate tissues. Histogram of the Euclidean distance between the gene expression values for paired tissue samples. We expect these distances to be fairly small because we assume that, for the most part, the same genes are expressed in samples from the same tissue. In fact, this does appear to be the case - the typical distance between replicate samples is 30. However, one pair of replicate tissues (Cardiac Myocytes) has a distance larger than 90, suggesting one of the samples may be of poor quality.
Figure 3
Figure 3
Removing poor quality arrays improves prediction. Plot of Matthews Correlation Coefficient for a clinical parameter, pathologic complete response, versus the number of lowest quality arrays removed for each quality metric. The prediction algorithm used was PAM. Prediction improved when removing the arrays with the poorest quality; however, some metrics did substantially better than others at detecting arrays that negatively affect prediction. RLE and Percent Present appeared to perform best, followed by NUSE and GNUSE. Average background showed no improvement when removing less than 30 arrays.
Figure 4
Figure 4
Distribution of GNUSE values. Histograms of GNUSE values from (A) 11299 HGU133a arrays from 338 studies and (B) 11029 HGU133plus2 arrays from 471 studies. Most GNUSE values are of acceptable quality (close to one), but the long right tail suggests some very poor quality probesets.
Figure 5
Figure 5
Distribution of median GNUSE values. Histograms of median GNUSE values from (A) 11299 HGU133a arrays or (B) 11029 HGU133plus2 arrays. The red vertical line represents the threshold of 1.25 - arrays with a median GNUSE greater than this threshold are considered poor quality. In both cases, this threshold appears to separate the majority of good quality arrays from the long right tail of poor quality arrays.
Figure 6
Figure 6
Lab accounts for more variability in quality than tissue type. Resulting from fitting Model 2. The individual random effects for lab and tissue are plotted and the estimated variance for each effect is reported. These estimates suggest that the lab in which an array was hybridized accounts for more of the variability in microarray data quality than the tissue that was hybridized to the array.

References

    1. Baker S, Bauer S, Beyer R, Brenton J, Bromley B, Burrill J, Causton H, Conley M, Elespuru R, Fero M, Foy C, Fuscoe J, Gao X, Gerhold D, Gilles P, Goodsaid F, Guo X, Hackett J, Hockett R, Ikonomi P, Irizarry R, Kawasaki E, Kaysser-Kranich T, Kerr K, Kiser G, Koch W, Lee K, Liu C, Liu Z, Lucas A. et al.The External RNA Controls Consortium: a progress report. Nature Methods. 2005;2:731–734. doi: 10.1038/nmeth1005-731. - DOI - PubMed
    1. Consortium M, Shi L, Reid L, Jones W, Shippy R, Warrington J, Baker S, Collins P, de Longueville F, Kawasaki E, Lee K, Luo Y, Sun Y, Willey J, Setterquist R, Fischer G, Tong W, Dragan Y, Dix D, Frueh F, Goodsaid F, Herman D, Jensen R, Johnson C, Lobenhofer E, Puri R, Schrf U, Thierry-Mieg J, Wang C, Wilson M. et al.The MicroArray Quality Control (MAQC) project shows inter-and intraplatform reproducibility of gene expression measurements. Nature Biotechnology. 2006;24:1151–1161. doi: 10.1038/nbt1239. - DOI - PMC - PubMed
    1. Shi L, Campbell G, Jones W, Campagne F, Wen Z, Walker S, Su Z, Chu T, Goodsaid F, Pusztai L, Shaughnessy JJ, Oberthuer A, Thomas R, Paules R, Fielden M, Barlogie B, Chen W, Du P, Fischer M, Furlanello C, Gallas B, Ge X, Megherbi D, Symmans W, Wang M, Zhang J, Bitter H, Brors B, Bushel P, Bylesjo M. et al.The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nature biotechnology. 2010;28(8):827. doi: 10.1038/nbt.1665. - DOI - PMC - PubMed
    1. American Society of Quality. http://asq.org/glossary/index.html
    1. Zilliox M, Irizarry R. A gene expression bar code for microarray data. Nature Methods. 2007;4:911–913. doi: 10.1038/nmeth1102. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources