Self-contained gene-set analysis of expression data: an evaluation of existing and novel methods

Brooke L Fridley¹, Gregory D Jenkins, Joanna M Biernacka

Affiliations

PMID: 20862301
PMCID: PMC2941449
DOI: 10.1371/journal.pone.0012693

Self-contained gene-set analysis of expression data: an evaluation of existing and novel methods

Brooke L Fridley et al. PLoS One. 2010.

. 2010 Sep 17;5(9):e12693.

doi: 10.1371/journal.pone.0012693.

Authors

Brooke L Fridley¹, Gregory D Jenkins, Joanna M Biernacka

Affiliation

¹ Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA. fridley.brooke@mayo.edu

PMID: 20862301
PMCID: PMC2941449
DOI: 10.1371/journal.pone.0012693

Abstract

Gene set methods aim to assess the overall evidence of association of a set of genes with a phenotype, such as disease or a quantitative trait. Multiple approaches for gene set analysis of expression data have been proposed. They can be divided into two types: competitive and self-contained. Benefits of self-contained methods include that they can be used for genome-wide, candidate gene, or pathway studies, and have been reported to be more powerful than competitive methods. We therefore investigated ten self-contained methods that can be used for continuous, discrete and time-to-event phenotypes. To assess the power and type I error rate for the various previously proposed and novel approaches, an extensive simulation study was completed in which the scenarios varied according to: number of genes in a gene set, number of genes associated with the phenotype, effect sizes, correlation between expression of genes within a gene set, and the sample size. In addition to the simulated data, the various methods were applied to a pharmacogenomic study of the drug gemcitabine. Simulation results demonstrated that overall Fisher's method and the global model with random effects have the highest power for a wide range of scenarios, while the analysis based on the first principal component and Kolmogorov-Smirnov test tended to have lowest power. The methods investigated here are likely to play an important role in identifying pathways that contribute to complex traits.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. Pairwise scatterplot of power for the various methods for scenarios with standard deviation (σ) of 6.0.**

**Figure 2. Plots of power for all methods.**
Power is plotted as a function of (A) sample size, (B) the correlation between expression values within the gene set (ρ), (C) the proportion of probes associated with the phenotype, and (D) the calculated R², the proportion of variation in the quantitative phenotype explained by the gene expression values in the pathway. The average power values are based on all simulated non-null scenarios. Plot (B) excludes scenarios with between-probe correlation structure defined by the gemcitabine pathway, and only shows fixed-correlation scenarios (ρ = 0, 0.1, 0.3). Plots (B), (C), and (D) are based on sample size of 100. Similar plots for sample sizes of 20 and 500 are shown in Figure S1. For plots (C) and (D) a kernel smoother was used to fit a curve to the data. Scenarios with all expression probes being associated with the trait were excluded from plot (C), as all the methods had very high power in this situation.

Figure 3. Power of Fisher's Method (FM) as a function of sample size, correlation of expression values between probes (ρ), and R² (proportion of variation in the quantitative phenotype explained by the gene expression values in the gene set).

See this image and copyright information in PMC

References

1. Goeman JJ, Buhlmann P. Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics. 2007;23:980–987. - PubMed
1. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102:15545–15550. - PMC - PubMed
1. Dennis G, Jr, Sherman BT, Hosack DA, Yang J, Gao W, et al. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 2003;4:P3. - PubMed
1. Allison DB, Cui X, Page GP, Sabripour M. Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet. 2006;7:55–65. - PubMed
1. Liu Q, Dinu I, Adewale AJ, Potter JD, Yasui Y. Comparative evaluation of gene-set analysis methods. BMC Bioinformatics. 2007;8:431. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Self-contained gene-set analysis of expression data: an evaluation of existing and novel methods

Affiliation

Self-contained gene-set analysis of expression data: an evaluation of existing and novel methods

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources