Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Sep 15;25(18):2348-54.
doi: 10.1093/bioinformatics/btp406. Epub 2009 Jul 2.

Unite and conquer: univariate and multivariate approaches for finding differentially expressed gene sets

Affiliations

Unite and conquer: univariate and multivariate approaches for finding differentially expressed gene sets

Galina V Glazko et al. Bioinformatics. .

Abstract

Motivation: Recently, many univariate and several multivariate approaches have been suggested for testing differential expression of gene sets between different phenotypes. However, despite a wealth of literature studying their performance on simulated and real biological data, still there is a need to quantify their relative performance when they are testing different null hypotheses.

Results: In this article, we compare the performance of univariate and multivariate tests on both simulated and biological data. In the simulation study we demonstrate that high correlations equally affect the power of both, univariate as well as multivariate tests. In addition, for most of them the power is similarly affected by the dimensionality of the gene set and by the percentage of genes in the set, for which expression is changing between two phenotypes. The application of different test statistics to biological data reveals that three statistics (sum of squared t-tests, Hotelling's T(2), N-statistic), testing different null hypotheses, find some common but also some complementing differentially expressed gene sets under specific settings. This demonstrates that due to complementing null hypotheses each test projects on different aspects of the data and for the analysis of biological data it is beneficial to use all three tests simultaneously instead of focusing exclusively on just one.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The power curves of five tests. The mean expression vector is changing, the variance is fixed to 1, for correlation of 0.1 among genes. Parameter ‘genes’ is a number of genes in a gene set. Parameter ‘gamma’ corresponds to detection call.
Fig. 2.
Fig. 2.
The power curves of five tests. The mean expression vector is changing, the variance is fixed to 1, for correlation of 0.5 among genes. Parameter ‘genes’ is a number of genes in a gene set. Parameter ‘gamma’ corresponds to detection call.
Fig. 3.
Fig. 3.
The power curves of five tests. The mean expression vector is changing, the variance is fixed to 1, for correlation of 0.9 among genes. Parameter ‘genes’ is a number of genes in a gene set. Parameter ‘gamma’ corresponds to detection call.
Fig. 4.
Fig. 4.
The power curves of five tests. The mean expression vector is fixed, the variance is changing from 1 to 5 for correlation of 0.1, 0.5 and 0.9 among genes.
Fig. 5.
Fig. 5.
Agreement among tests statistics. Venn diagrams for (a) the p53 data set and (b) the ALL data set.

Similar articles

Cited by

References

    1. Ackermann M, Strimmer K. A general modular framework for gene set enrichment analysis. BMC Bioinformatics. 2009;10:47. - PMC - PubMed
    1. Baringhaus L, Franz C. On a new multivariate two-sample test. J. Multivariate Anal. 2004;88:190–206.
    1. Barry WT, et al. A statistical framework for testing functional categories in microarray data. Ann. Appl. Stat. 2008;2:286–315.
    1. Dempster AP. A high dimentional two sample significance test. Ann. Math. Statist. 1958;29:995–1010.
    1. Dudoit S, van der Laan MJ. Multiple Testing Procedures with Applications to Genomics. Berlin: Springer; 2008.

Publication types