ARSyN: a method for the identification and removal of systematic noise in multifactorial time course microarray experiments
- PMID: 22085896
- DOI: 10.1093/biostatistics/kxr042
ARSyN: a method for the identification and removal of systematic noise in multifactorial time course microarray experiments
Abstract
Transcriptomic profiling experiments that aim to the identification of responsive genes in specific biological conditions are commonly set up under defined experimental designs that try to assess the effects of factors and their interactions on gene expression. Data from these controlled experiments, however, may also contain sources of unwanted noise that can distort the signal under study, affect the residuals of applied statistical models, and hamper data analysis. Commonly, normalization methods are applied to transcriptomics data to remove technical artifacts, but these are normally based on general assumptions of transcript distribution and greatly ignore both the characteristics of the experiment under consideration and the coordinative nature of gene expression. In this paper, we propose a novel methodology, ARSyN, for the preprocessing of microarray data that takes into account these 2 last aspects. By combining analysis of variance (ANOVA) modeling of gene expression values and multivariate analysis of estimated effects, the method identifies the nonstructured part of the signal associated to the experimental factors (the noise within the signal) and the structured variation of the ANOVA errors (the signal of the noise). By removing these noise fractions from the original data, we create a filtered data set that is rich in the information of interest and includes only the random noise required for inferential analysis. In this work, we focus on multifactorial time course microarray (MTCM) experiments with 2 factors: one quantitative such as time or dosage and the other qualitative, as tissue, strain, or treatment. However, the method can be used in other situations such as experiments with only one factor or more complex designs with more than 2 factors. The filtered data obtained after applying ARSyN can be further analyzed with the appropriate statistical technique to obtain the biological information required. To evaluate the performance of the filtering strategy, we have applied different statistical approaches for MTCM analysis to several real and simulated data sets, studying also the efficiency of these techniques. By comparing the results obtained with the original and ARSyN filtered data and also with other filtering techniques, we can conclude that the proposed method increases the statistical power to detect biological signals, especially in cases where there are high levels of structural noise. Software for ARSyN is freely available at http://www.ua.es/personal/mj.nueda.
Similar articles
-
Discovering gene expression patterns in time course microarray experiments by ANOVA-SCA.Bioinformatics. 2007 Jul 15;23(14):1792-800. doi: 10.1093/bioinformatics/btm251. Epub 2007 May 22. Bioinformatics. 2007. PMID: 17519250
-
Interpretation of ANOVA models for microarray data using PCA.Bioinformatics. 2007 Jan 15;23(2):184-90. doi: 10.1093/bioinformatics/btl572. Epub 2006 Nov 14. Bioinformatics. 2007. PMID: 17105717
-
Including probe-level measurement error in robust mixture clustering of replicated microarray gene expression.Stat Appl Genet Mol Biol. 2010;9:Article42. doi: 10.2202/1544-6115.1600. Epub 2010 Dec 9. Stat Appl Genet Mol Biol. 2010. PMID: 21194414
-
Using ANOVA to analyze microarray data.Biotechniques. 2004 Aug;37(2):173-5, 177. doi: 10.2144/04372TE01. Biotechniques. 2004. PMID: 15335204 Review.
-
An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors.BMC Med Inform Decis Mak. 2006 Jun 21;6:27. doi: 10.1186/1472-6947-6-27. BMC Med Inform Decis Mak. 2006. PMID: 16790051 Free PMC article. Review.
Cited by
-
Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series.Bioinformatics. 2014 Sep 15;30(18):2598-602. doi: 10.1093/bioinformatics/btu333. Epub 2014 Jun 3. Bioinformatics. 2014. PMID: 24894503 Free PMC article.
-
Optogenetic Stimulation of Prelimbic Pyramidal Neurons Maintains Fear Memories and Modulates Amygdala Pyramidal Neuron Transcriptome.Int J Mol Sci. 2021 Jan 15;22(2):810. doi: 10.3390/ijms22020810. Int J Mol Sci. 2021. PMID: 33467450 Free PMC article.
-
Large-Scale Meta-Longitudinal Microbiome Data with a Known Batch Factor.Genes (Basel). 2022 Feb 22;13(3):392. doi: 10.3390/genes13030392. Genes (Basel). 2022. PMID: 35327945 Free PMC article.
-
Brain Gene Expression Pattern of Subjects with Completed Suicide and Comorbid Substance Use Disorder.Mol Neuropsychiatry. 2019 Mar;5(1):60-73. doi: 10.1159/000493940. Epub 2018 Nov 12. Mol Neuropsychiatry. 2019. PMID: 31019919 Free PMC article.
-
Epithelial cell responses to rhinovirus identify an early-life-onset asthma phenotype in adults.J Allergy Clin Immunol. 2022 Sep;150(3):604-611. doi: 10.1016/j.jaci.2022.03.020. Epub 2022 Mar 31. J Allergy Clin Immunol. 2022. PMID: 35367470 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources