Interpretation of ANOVA models for microarray data using PCA
- PMID: 17105717
- DOI: 10.1093/bioinformatics/btl572
Interpretation of ANOVA models for microarray data using PCA
Abstract
Motivation: ANOVA is a technique, which is frequently used in the analysis of microarray data, e.g. to assess the significance of treatment effects, and to select interesting genes based on P-values. However, it does not give information about what exactly is causing the effect. Our purpose is to improve the interpretation of the results from ANOVA on large microarray datasets, by applying PCA on the individual variance components. Interaction effects can be visualized by biplots, showing genes and variables in one plot, providing insight in the effect of e.g. treatment or time on gene expression. Because ANOVA has removed uninteresting sources of variance, the results are much more interpretable than without ANOVA. Moreover, the combination of ANOVA and PCA provides a simple way to select genes, based on the interactions of interest.
Results: It is shown that the components from an ANOVA model can be summarized and visualized with PCA, which improves the interpretability of the models. The method is applied to a real time-course gene expression dataset of mesenchymal stem cells. The dataset was designed to investigate the effect of different treatments on osteogenesis. The biplots generated with the algorithm give specific information about the effects of specific treatments on genes over time. These results are in agreement with the literature. The biological validation with GO annotation from the genes present in the selections shows that biologically relevant groups of genes are selected.
Availability: R code with the implementation of the method for this dataset is available from http://www.cac.science.ru.nl under the heading "Software".
Similar articles
-
Discovering gene expression patterns in time course microarray experiments by ANOVA-SCA.Bioinformatics. 2007 Jul 15;23(14):1792-800. doi: 10.1093/bioinformatics/btm251. Epub 2007 May 22. Bioinformatics. 2007. PMID: 17519250
-
Inferring gene regulatory networks from multiple microarray datasets.Bioinformatics. 2006 Oct 1;22(19):2413-20. doi: 10.1093/bioinformatics/btl396. Epub 2006 Jul 24. Bioinformatics. 2006. PMID: 16864593
-
Fast network component analysis (FastNCA) for gene regulatory network reconstruction from microarray data.Bioinformatics. 2008 Jun 1;24(11):1349-58. doi: 10.1093/bioinformatics/btn131. Epub 2008 Apr 9. Bioinformatics. 2008. PMID: 18400771
-
Analysis of variance of microarray data.Methods Enzymol. 2006;411:214-33. doi: 10.1016/S0076-6879(06)11011-3. Methods Enzymol. 2006. PMID: 16939792 Review.
-
Application of bioinformatics for DNA microarray data to bioscience, bioengineering and medical fields.J Biosci Bioeng. 2006 May;101(5):377-84. doi: 10.1263/jbb.101.377. J Biosci Bioeng. 2006. PMID: 16781465 Review.
Cited by
-
Integrating gene expression and GO classification for PCA by preclustering.BMC Bioinformatics. 2010 Mar 26;11:158. doi: 10.1186/1471-2105-11-158. BMC Bioinformatics. 2010. PMID: 20346140 Free PMC article.
-
Principal component analysis for designed experiments.BMC Bioinformatics. 2015;16 Suppl 18(Suppl 18):S7. doi: 10.1186/1471-2105-16-S18-S7. Epub 2015 Dec 9. BMC Bioinformatics. 2015. PMID: 26678818 Free PMC article.
-
ANOVA simultaneous component analysis: A tutorial review.Anal Chim Acta X. 2020 Oct 6;6:100061. doi: 10.1016/j.acax.2020.100061. eCollection 2020 Nov. Anal Chim Acta X. 2020. PMID: 33392497 Free PMC article. Review.
-
Principal component model of multispectral data for near real-time skin chromophore mapping.J Biomed Opt. 2010 Jul-Aug;15(4):046007. doi: 10.1117/1.3463010. J Biomed Opt. 2010. PMID: 20799809 Free PMC article.
-
Combining Chemical Information From Grass Pollen in Multimodal Characterization.Front Plant Sci. 2020 Jan 31;10:1788. doi: 10.3389/fpls.2019.01788. eCollection 2019. Front Plant Sci. 2020. PMID: 32082348 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources