The practical effect of batch on genomic prediction
- PMID: 22611599
- PMCID: PMC3760371
- DOI: 10.1515/1544-6115.1766
The practical effect of batch on genomic prediction
Abstract
Measurements from microarrays and other high-throughput technologies are susceptible to non-biological artifacts like batch effects. It is known that batch effects can alter or obscure the set of significant results and biological conclusions in high-throughput studies. Here we examine the impact of batch effects on predictors built from genomic technologies. To investigate batch effects, we collected publicly available gene expression measurements with known outcomes, and estimated batches using date. Using these data we show (1) the impact of batch effects on prediction depends on the correlation between outcome and batch in the training data, and (2) removing expression measurements most affected by batch before building predictors may improve the accuracy of those predictors. These results suggest that (1) training sets should be designed to minimize correlation between batches and outcome, and (2) methods for identifying batch-affected probes should be developed to improve prediction results for studies with high correlation between batches and outcome.
Figures
References
-
- Akey JM, Biswas S, Leek JT, Storey JD. On the design and analysis of gene expression studies in human populations. Nature Genetics. 2007;39:807–8. author reply 808–9, URL http://dx.doi.org/10.1038/ng0707-807. - DOI - PubMed
-
- Allison DB, Cui X, Page GP, Sabripour M. Microarray data analysis: from disarray to consolidation and consensus. Nature Reviews Genetics. 2006;7:55–65. URL http://dx.doi.org/10.1038/nrg1749. - DOI - PubMed
-
- Baggerly KA, Edmonson SR, Morris JS, Coombes KR. High-resolution serum proteomic patterns for ovarian cancer detection. Endocrine-Related Cancer. 2004;11:583–4. author reply 585–7, URL http://dx.doi.org/10.1677/erc.1.00868. - DOI - PubMed
-
- Carroll JS, Meyer CA, Song J, Li W, Geistlinger TR, Eeckhoute J, Brodsky AS, Keeton EK, Fertuck KC, Hall GF, Wang Q, Bekiranov S, Sementchenko V, Fox EA, Silver PA, Gingeras TR, Liu XS, Brown M. Genome-wide analysis of estrogen receptor binding sites. Nature Genetics. 2006;38:1289–97. URL http://dx.doi.org/10.1038/ng1901. - DOI - PubMed
-
- Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Research. 2002;30:207–10. URL http://dx.doi.org/10.1093/nar/30.1.207. - DOI - PMC - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources