High-throughput processing and normalization of one-color microarrays for transcriptional meta-analyses
- PMID: 22166002
- PMCID: PMC3236842
- DOI: 10.1186/1471-2105-12-S10-S2
High-throughput processing and normalization of one-color microarrays for transcriptional meta-analyses
Abstract
Background: Microarray experiments are becoming increasingly common in biomedical research, as is their deposition in publicly accessible repositories, such as Gene Expression Omnibus (GEO). As such, there has been a surge in interest to use this microarray data for meta-analytic approaches, whether to increase sample size for a more powerful analysis of a specific disease (e.g. lung cancer) or to re-examine experiments for reasons different than those examined in the initial, publishing study that generated them. For the average biomedical researcher, there are a number of practical barriers to conducting such meta-analyses such as manually aggregating, filtering and formatting the data. Methods to automatically process large repositories of microarray data into a standardized, directly comparable format will enable easier and more reliable access to microarray data to conduct meta-analyses.
Methods: We present a straightforward, simple but robust against potential outliers method for automatic quality control and pre-processing of tens of thousands of single-channel microarray data files. GEO GDS files are quality checked by comparing parametric distributions and quantile normalized to enable direct comparison of expression level for subsequent meta-analyses.
Results: 13,000 human 1-color experiments were processed to create a single gene expression matrix that subsets can be extracted from to conduct meta-analyses. Interestingly, we found that when conducting a global meta-analysis of gene-gene co-expression patterns across all 13,000 experiments to predict gene function, normalization had minimal improvement over using the raw data.
Conclusions: Normalization of microarray data appears to be of minimal importance on analyses based on co-expression patterns when the sample size is on the order of thousands microarray datasets. Smaller subsets, however, are more prone to aberrations and artefacts, and effective means of automating normalization procedures not only empowers meta-analytic approaches, but aids in reproducibility by providing a standard way of approaching the problem.Data availability: matrix containing normalized expression of 20,813 genes across 13,000 experiments is available for download at . Source code for GDS files pre-processing is available from the authors upon request.
Figures




Similar articles
-
ArrayWiki: an enabling technology for sharing public microarray data repositories and meta-analyses.BMC Bioinformatics. 2008 May 28;9 Suppl 6(Suppl 6):S18. doi: 10.1186/1471-2105-9-S6-S18. BMC Bioinformatics. 2008. PMID: 18541053 Free PMC article.
-
Microarray meta-analysis database (M(2)DB): a uniformly pre-processed, quality controlled, and manually curated human clinical microarray database.BMC Bioinformatics. 2010 Aug 10;11:421. doi: 10.1186/1471-2105-11-421. BMC Bioinformatics. 2010. PMID: 20698961 Free PMC article.
-
MAAMD: a workflow to standardize meta-analyses and comparison of affymetrix microarray data.BMC Bioinformatics. 2014 Mar 12;15:69. doi: 10.1186/1471-2105-15-69. BMC Bioinformatics. 2014. PMID: 24621103 Free PMC article.
-
Microarray databases: standards and ontologies.Nat Genet. 2002 Dec;32 Suppl:469-73. doi: 10.1038/ng1028. Nat Genet. 2002. PMID: 12454640 Review.
-
Microarray expression profiling: analysis and applications.Curr Opin Drug Discov Devel. 2003 May;6(3):384-95. Curr Opin Drug Discov Devel. 2003. PMID: 12833672 Review.
Cited by
-
Systematic classification of non-coding RNAs by epigenomic similarity.BMC Bioinformatics. 2013;14 Suppl 14(Suppl 14):S2. doi: 10.1186/1471-2105-14-S14-S2. Epub 2013 Oct 9. BMC Bioinformatics. 2013. PMID: 24267974 Free PMC article.
-
Ribosomal and immune transcripts associate with relapse in acquired ADAMTS13-deficient thrombotic thrombocytopenic purpura.PLoS One. 2015 Feb 11;10(2):e0117614. doi: 10.1371/journal.pone.0117614. eCollection 2015. PLoS One. 2015. PMID: 25671313 Free PMC article.
-
Comparative study of joint analysis of microarray gene expression data in survival prediction and risk assessment of breast cancer patients.Brief Bioinform. 2016 Sep;17(5):771-85. doi: 10.1093/bib/bbv092. Epub 2015 Oct 26. Brief Bioinform. 2016. PMID: 26504096 Free PMC article.
-
Proceedings of the 2011 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) conference. Introduction.BMC Bioinformatics. 2011 Oct 18;12 Suppl 10(Suppl 10):S1. doi: 10.1186/1471-2105-12-S10-S1. BMC Bioinformatics. 2011. PMID: 22165918 Free PMC article. No abstract available.
-
Prediction and Analysis of Key Genes in Glioblastoma Based on Bioinformatics.Biomed Res Int. 2017;2017:7653101. doi: 10.1155/2017/7653101. Epub 2017 Jan 16. Biomed Res Int. 2017. PMID: 28191466 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials