Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2008 Nov 27:9:502.
doi: 10.1186/1471-2105-9-502.

Microarray-based gene set analysis: a comparison of current methods

Affiliations
Comparative Study

Microarray-based gene set analysis: a comparison of current methods

Sarah Song et al. BMC Bioinformatics. .

Abstract

Background: The analysis of gene sets has become a popular topic in recent times, with researchers attempting to improve the interpretability and reproducibility of their microarray analyses through the inclusion of supplementary biological information. While a number of options for gene set analysis exist, no consensus has yet been reached regarding which methodology performs best, and under what conditions. The goal of this work was to examine the performance characteristics of a collection of existing gene set analysis methods, on both simulated and real microarray data sets. Of particular interest was the potential utility gained through the incorporation of inter-gene correlation into the analysis process.

Results: Each of six gene set analysis methods was applied to both simulated and publicly available microarray data sets. Overall, the various methodologies were all found to be better at detecting gene sets that moved from non-active (i.e., genes not expressed) to active states (or vice versa), rather than those that simply changed their level of activity. Methods which incorporate correlation structures were found to provide increased ability to detect altered gene sets in some settings.

Conclusion: Based on the results obtained through the analysis of simulated data, it is clear that the performance of gene set analysis methods is strongly influenced by the features of the data set in question, and that methods which incorporate correlation structures into the analysis process tend to achieve better performance, relative to methods which rely on univariate test statistics.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Detection rates in simulated data sets as a function of increasing correlation. For each gene set analysis method, the data were permuted 2,000 times to generate p-values for each gene set, within each simulation. FDR-adjusted p-values of less than 0.05 were used to indicate significance. (a) Gene sets active in one class and inactive in the other, On-Off (D), with a difference between the means of 1. (b) Gene sets active in both classes, On-On (D), with a difference between the means of 1. (c) Gene sets active in one class and inactive in the other, On-Off (D), with a difference between the means of 0.5. (d) Gene sets active in both classes, On-On (D), with a difference between the means of 0.5.
Figure 2
Figure 2
Detection rates in simulated data sets as a function of increasing inter-class separation. For each gene set analysis method, the data were permuted 2,000 times to generate p-values for each gene set, within each simulation. FDR-adjusted p-values of less than 0.05 were used to indicate significance. (a) Gene sets active in one class and inactive in the other, On-Off (D), with pairwise correlations of 0.1. (b) Gene sets were active in both classes, On-On (D), with pairwise correlations of 0.1. (c) Gene sets active in one class and inactive in the other, On-Off (D), with pairwise correlations of 0.25. (d) Gene sets were active in both classes, On-On (D), with pairwise correlations of 0.25.
Figure 3
Figure 3
Visualization of expression and correlation in the OXPHOS HG-U133A probes pathway using the pcot2 package. The four red-blue plots represent pairwise correlations between genes in the pathway, with positively correlated genes clustered together. The top left plot relates to the inter-gene correlations observed within the DM2 samples, while the top right plot contains the inter-gene correlation information for the NGT samples, with genes in the same order as the top left plot (i.e., the gene order is the same in both plots on the top row). The same approach is taken in the bottom two plots, with the bottom right plot representing inter-gene correlation within the NGT samples, with genes again grouped by correlation. The gene order in the bottom left plot (DM2 samples) is then the same as that in the bottom right. The gray-scale plots in the center of the figure indicate gene expression intensity, while the red-green plots show the change in expression level (red indicates up-regulation in DM2 relative to NGT).
Figure 4
Figure 4
Significant gene sets detected in the leukemia data set (GSEA-Category, Globaltest, PCOT2, sigPathway). (a) The GSEA-Category, Globaltest and PCOT2 approaches detected 72 (GSEA-Category), 77 (Globaltest) and 67 (PCOT2) gene sets as undergoing significant changes in expression activity, after correction for multiple testing, with 57 gene sets detected as significantly altered by all three approaches. The two methods which incorporate correlation structure into their assessment procedure (Globaltest and PCOT2) exhibited strong agreement in the gene sets they found to be altered (63 significant gene sets in common). (b) Of the 57 changed gene sets identified in common by GSEA-Category, Globaltest and PCOT2, 7 were also found by sigPathway.

References

    1. Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences. 2001;98:5116–5121. doi: 10.1073/pnas.091062498. - DOI - PMC - PubMed
    1. Smyth GK. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology. 2004;3:3. doi: 10.2202/1544-6115.1027. - DOI - PubMed
    1. Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstraale M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesirov JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop LC. PGC-1 α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genetics. 2003;34:267–273. doi: 10.1038/ng1180. - DOI - PubMed
    1. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences. 2005;102:15545–15550. doi: 10.1073/pnas.0506580102. - DOI - PMC - PubMed
    1. Barry WT, Nobel AB, Wright FA. Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics. 2005;21:1943–1949. doi: 10.1093/bioinformatics/bti260. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources