Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2015 Oct;168(7):517-27.
doi: 10.1002/ajmg.b.32328. Epub 2015 Jun 8.

Gene set analysis: A step-by-step guide

Affiliations
Review

Gene set analysis: A step-by-step guide

Michael A Mooney et al. Am J Med Genet B Neuropsychiatr Genet. 2015 Oct.

Abstract

To maximize the potential of genome-wide association studies, many researchers are performing secondary analyses to identify sets of genes jointly associated with the trait of interest. Although methods for gene-set analyses (GSA), also called pathway analyses, have been around for more than a decade, the field is still evolving. There are numerous algorithms available for testing the cumulative effect of multiple SNPs, yet no real consensus in the field about the best way to perform a GSA. This paper provides an overview of the factors that can affect the results of a GSA, the lessons learned from past studies, and suggestions for how to make analysis choices that are most appropriate for different types of data. © 2015 Wiley Periodicals, Inc.

Keywords: complex traits; gene set analysis; genome-wide association studies; polygenic effects.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest: None.

Figures

FIG. 1
FIG. 1
A gene set analysis workflow, including the possible choices that must be made at each step of the analysis. The data type requirements for the various statistical methods are indicated by color. For instance, regression-based methods and permutation procedures that randomize samples require genotypes as inputs. On the other hand, over-representation methods utilize summary statistics. Some methods (multicolored) are not restricted to one type of input data.
FIG. 2
FIG. 2
Gene sets related to glucocorticoid receptor processes. Gene sets from NCI’s Pathway Interaction Database, the proprietary Metacore database, and the Gene Ontology database were overlaid onto an interaction network from the STRING protein-protein interaction database (only high confidence interactions are shown, STRING combined score ≥0.9). Colored genes are unique to a particular database, while gray genes are shared between two or more databases (only NR3C1 is common to all four databases). There are clear differences in membership between gene sets from different data sources. These differences may be due to an attempt to model distinct processes, but are also indicative of incomplete annotation. For instance, some genes are unique to a single database even when there is evidence of interaction with multiple genes from another database (e.g., ARID1A is not part of the NCI-PID gene set, but is connected to four of its member genes).
FIG. 3
FIG. 3
Visualizations of gene set analysis results. Depending on the source of the gene sets, gene interaction information can be obtained from pathway maps or PPI databases. A: A signaling pathway map with genes colored to show gene-level association measures. Dark blue indicates a weak association and dark red indicates a strong association. B: A gene set overlaid onto a PPI network to show known interactions between genes. Here the strength of gene-level associations is indicated by node size.

References

    1. Alexeyenko A, Lee W, Pernemalm M, Guegan J, Dessen P, Lazar V, Lehtio J, Pawitan Y. Network enrichment analysis: Extension of gene-set enrichment analysis to gene networks. BMC Bioinformatics. 2012;11(13):226. - PMC - PubMed
    1. Araki H, Knapp C, Tsai P, Print C. GeneSetDB: A comprehensive meta-database, statistical, and visualisation framework for gene set analysis. FEBS Open Bio. 2012;2:76–82. - PMC - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: Tool for the unification of biology. Nat Genet. 2000;25(1):25–29. - PMC - PubMed
    1. Bakir-Gungor B, Sezerman OU. A new methodology to associate SNPs with human diseases according to their pathway related context. PLoS One. 2011;6(10):e26277. - PMC - PubMed
    1. Bakir-Gungor B, Egemen E, Sezerman OU. PANOGA: A web server for identification of SNP-targeted pathways from genome-wide association study data. Bioinformatics. 2014;30(9):1287–1289. - PubMed

Publication types