Using control genes to correct for unwanted variation in microarray data
- PMID: 22101192
- PMCID: PMC3577104
- DOI: 10.1093/biostatistics/kxr034
Using control genes to correct for unwanted variation in microarray data
Abstract
Microarray expression studies suffer from the problem of batch effects and other unwanted variation. Many methods have been proposed to adjust microarray data to mitigate the problems of unwanted variation. Several of these methods rely on factor analysis to infer the unwanted variation from the data. A central problem with this approach is the difficulty in discerning the unwanted variation from the biological variation that is of interest to the researcher. We present a new method, intended for use in differential expression studies, that attempts to overcome this problem by restricting the factor analysis to negative control genes. Negative control genes are genes known a priori not to be differentially expressed with respect to the biological factor of interest. Variation in the expression levels of these genes can therefore be assumed to be unwanted variation. We name this method "Remove Unwanted Variation, 2-step" (RUV-2). We discuss various techniques for assessing the performance of an adjustment method and compare the performance of RUV-2 with that of other commonly used adjustment methods such as Combat and Surrogate Variable Analysis (SVA). We present several example studies, each concerning genes differentially expressed with respect to gender in the brain and find that RUV-2 performs as well or better than other methods. Finally, we discuss the possibility of adapting RUV-2 for use in studies not concerned with differential expression and conclude that there may be promise but substantial challenges remain.
Figures




Similar articles
-
A robust removing unwanted variation-testing procedure via -divergence.Biometrics. 2019 Jun;75(2):650-662. doi: 10.1111/biom.13002. Epub 2019 Aug 20. Biometrics. 2019. PMID: 30430537
-
RUV-III-NB: normalization of single cell RNA-seq data.Nucleic Acids Res. 2022 Sep 9;50(16):e96. doi: 10.1093/nar/gkac486. Nucleic Acids Res. 2022. PMID: 35758618 Free PMC article.
-
Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed.Biostatistics. 2016 Jan;17(1):16-28. doi: 10.1093/biostatistics/kxv026. Epub 2015 Aug 17. Biostatistics. 2016. PMID: 26286812 Free PMC article.
-
Pre-processing of microarray data and analysis of differential expression.Methods Mol Biol. 2008;452:89-110. doi: 10.1007/978-1-60327-159-2_4. Methods Mol Biol. 2008. PMID: 18563370 Review.
-
The analysis of microarray data.Pharmacogenomics. 2003 Jul;4(4):477-97. doi: 10.1517/phgs.4.4.477.22744. Pharmacogenomics. 2003. PMID: 12831325 Review.
Cited by
-
MOCCASIN: a method for correcting for known and unknown confounders in RNA splicing analysis.Nat Commun. 2021 Jun 7;12(1):3353. doi: 10.1038/s41467-021-23608-9. Nat Commun. 2021. PMID: 34099673 Free PMC article.
-
quantro: a data-driven approach to guide the choice of an appropriate normalization method.Genome Biol. 2015 Jun 4;16(1):117. doi: 10.1186/s13059-015-0679-0. Genome Biol. 2015. PMID: 26040460 Free PMC article.
-
UNIFYING AND GENERALIZING METHODS FOR REMOVING UNWANTED VARIATION BASED ON NEGATIVE CONTROLS.Stat Sin. 2021 Jul;31(3):1145-1166. doi: 10.5705/ss.202018.0345. Stat Sin. 2021. PMID: 38148787 Free PMC article.
-
Modeling confounding by half-sibling regression.Proc Natl Acad Sci U S A. 2016 Jul 5;113(27):7391-8. doi: 10.1073/pnas.1511656113. Proc Natl Acad Sci U S A. 2016. PMID: 27382154 Free PMC article.
-
Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously.Commun Biol. 2023 Feb 25;6(1):222. doi: 10.1038/s42003-023-04588-6. Commun Biol. 2023. PMID: 36841852 Free PMC article.
References
-
- Bishop CM. Pattern Recognition and Machine Learning. New York: Springer; 2006.
-
- Bolstad B, Collin F, Brettschneider J, Simpson K, Cope L, Irizarry R, Speed TP. Quality assessment of Affymetrix GeneChip data. In: Gentleman R, Carey V, Huber W, Irizarry R, Dudoit S, editors. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. New York: Springer; 2005. pp. 33–47.
-
- Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19:185–193. - PubMed
-
- Brettschneider J, Collin F, Bolstad BM, Speed TP. Quality assessment for short oligonucleotide microarray data. Technometrics. 2008;50:241–264.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources