Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Oct 18;12 Suppl 10(Suppl 10):S2.
doi: 10.1186/1471-2105-12-S10-S2.

High-throughput processing and normalization of one-color microarrays for transcriptional meta-analyses

Affiliations

High-throughput processing and normalization of one-color microarrays for transcriptional meta-analyses

Mikhail G Dozmorov et al. BMC Bioinformatics. .

Abstract

Background: Microarray experiments are becoming increasingly common in biomedical research, as is their deposition in publicly accessible repositories, such as Gene Expression Omnibus (GEO). As such, there has been a surge in interest to use this microarray data for meta-analytic approaches, whether to increase sample size for a more powerful analysis of a specific disease (e.g. lung cancer) or to re-examine experiments for reasons different than those examined in the initial, publishing study that generated them. For the average biomedical researcher, there are a number of practical barriers to conducting such meta-analyses such as manually aggregating, filtering and formatting the data. Methods to automatically process large repositories of microarray data into a standardized, directly comparable format will enable easier and more reliable access to microarray data to conduct meta-analyses.

Methods: We present a straightforward, simple but robust against potential outliers method for automatic quality control and pre-processing of tens of thousands of single-channel microarray data files. GEO GDS files are quality checked by comparing parametric distributions and quantile normalized to enable direct comparison of expression level for subsequent meta-analyses.

Results: 13,000 human 1-color experiments were processed to create a single gene expression matrix that subsets can be extracted from to conduct meta-analyses. Interestingly, we found that when conducting a global meta-analysis of gene-gene co-expression patterns across all 13,000 experiments to predict gene function, normalization had minimal improvement over using the raw data.

Conclusions: Normalization of microarray data appears to be of minimal importance on analyses based on co-expression patterns when the sample size is on the order of thousands microarray datasets. Smaller subsets, however, are more prone to aberrations and artefacts, and effective means of automating normalization procedures not only empowers meta-analytic approaches, but aids in reproducibility by providing a standard way of approaching the problem.Data availability: matrix containing normalized expression of 20,813 genes across 13,000 experiments is available for download at . Source code for GDS files pre-processing is available from the authors upon request.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Frequency histogram of mean/median ratio distribution of datasets used for processing. Datasets with MM ratio less than 1.2 were excluded.
Figure 2
Figure 2
Platforms and number of datasets used in the current study. A total of 43 platforms were used, which comprised 577 datasets and 13,000 experiments.
Figure 3
Figure 3
Box-and-whisker plots of data distribution in a sample dataset before (A) and after (B) quantile normalization. X axis – dataset names, Y axis – expression range, only values in 0-1,000 range shown for clarity.
Figure 4
Figure 4
Data distribution before and after quantile normalization. Example of expression changes in a dataset before and after quantile normalization, and a frequency histogram of an average distribution fit to all datasets. A) Data from a sample dataset plotted before (X-axis) vs. after (Y-axis) quantile normalization. No major distortions were observed, quantile normalization introduced only transitional rescaling to the data; B) An average distribution fitted to all datasets. This distribution allows setting global noise threshold and directly comparing expression levels across the datasets.

Similar articles

Cited by

References

    1. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC. et al.Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet. 2001;29(4):365–371. doi: 10.1038/ng1201-365. - DOI - PubMed
    1. Barrett T, Suzek TO, Troup DB, Wilhite SE, Ngau WC, Ledoux P, Rudnev D, Lash AE, Fujibuchi W, Edgar R. NCBI GEO: mining millions of expression profiles--database and tools. Nucleic Acids Res. 2005;33(Database issue):D562–566. - PMC - PubMed
    1. Brazma A, Parkinson H, Sarkans U, Shojatalab M, Vilo J, Abeygunawardena N, Holloway E, Kapushesky M, Kemmeren P, Lara GG. et al.ArrayExpress--a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 2003;31(1):68–71. doi: 10.1093/nar/gkg091. - DOI - PMC - PubMed
    1. Hubble J, Demeter J, Jin H, Mao M, Nitzberg M, Reddy TB, Wymore F, Zachariah ZK, Sherlock G, Ball CA. Implementation of GenePattern within the Stanford Microarray Database. Nucleic Acids Res. 2009;37(Database issue):D898–901. - PMC - PubMed
    1. Kapushesky M, Kemmeren P, Culhane AC, Durinck S, Ihmels J, Korner C, Kull M, Torrente A, Sarkans U, Vilo J. et al.Expression Profiler: next generation--an online platform for analysis of microarray data. Nucleic Acids Res. 2004;32(Web Server issue):W465–470. - PMC - PubMed

Publication types