Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2014 Sep;30(9):390-400.
doi: 10.1016/j.tig.2014.07.004. Epub 2014 Aug 22.

Functional and genomic context in pathway analysis of GWAS data

Affiliations
Review

Functional and genomic context in pathway analysis of GWAS data

Michael A Mooney et al. Trends Genet. 2014 Sep.

Abstract

Gene set analysis (GSA) is a promising tool for uncovering the polygenic effects associated with complex diseases. However, the available techniques reflect a wide variety of hypotheses about how genetic effects interact to contribute to disease susceptibility. The lack of consensus about the best way to perform GSA has led to confusion in the field and has made it difficult to compare results across methods. A clear understanding of the various choices made during GSA - such as how gene sets are defined, how single-nucleotide polymorphisms (SNPs) are assigned to genes, and how individual SNP-level effects are aggregated to produce gene- or pathway-level effects - will improve the interpretability and comparability of results across methods and studies. In this review we provide an overview of the various data sources used to construct gene sets and the statistical methods used to test for gene set association, as well as provide guidelines for ensuring the comparability of results.

Keywords: GWAS; complex traits; gene set analysis; polygenic effects.

PubMed Disclaimer

Figures

Figure I
Figure I
Histogram of pathway sizes for 2256 pathways from Pathway Commons. Both axes have been trimmed (Mean = 38, Max = 1757). There are 775 pathways (34 %) with fewer than 10 genes or greater than 200 genes.
Figure 1
Figure 1
An overview of the steps performed in a gene set analysis. The effects of decisions made at each step are highlighted.
Figure 2
Figure 2
The number of genomic entities (genes or proteins) contained in various data sources used for gene set analysis. The proportion of the genome covered by different gene set data sources varies significantly. “Original entities” refers to the specific type of identifier used for gene set members in each data set. These entities were then mapped to Ensembl Gene identifiers for comparison across data sources. “Canonical pathways” includes all unique human pathways in the Pathway Commons database. The Human Protein Reference Database (HPRD) is a manually curated PPI network. The STRING PPI network contains evidence of interaction from multiple sources, including interactions inferred from text mining. For the Gene Ontology data, only Biological Processes were included here, as this category most closely resembles pathways. Including GO Molecular Functions and Cellular Components would increase the genomic coverage of GO by ~20 %.
Figure 3
Figure 3
Gene sets related to glucocorticoid receptor (GCR) processes retrieved from four data sources. A single pathway in the Pathway Commons database (GCR regulatory network) and one in Metacore (Development_GCR signaling), along with two GO biological processes (GCR signaling pathway, Negative regulation of GCR signaling pathway) and two GO molecular functions (GCR binding, GCR activity) were identified. The Venn diagram shows the limited overlap among gene sets from the four data sources, highlighting the differences in membership among gene sets from different sources.

Comment in

References

    1. Hindorff LA, MacArthur J, Morales J, Junkins HA, Hall PN, Klemm AK, Manolio TA European Bioinformatics Institute. [Accessed February 19, 2014];A Catalog of Published Genome-Wide Association Studies. Available at: www.genome.gov/gwastudies.
    1. Hirschhorn JN. Genomewide association studies--illuminating biologic pathways. N Engl J Med. 2009 Apr 23;360(17):1699–701. - PubMed
    1. Holmans P. Statistical methods for pathway analysis of genome-wide data for association with complex genetic traits. Adv Genet. 2010;72:141–79. - PubMed
    1. Wang K, Li M, Hakonarson H. Analysing biological pathways in genome-wide association studies. Nat Rev Genet. 2010 Dec;11(12):843–54. - PubMed
    1. Ramanan VK, Shen L, Moore JH, Saykin AJ. Pathway analysis of genomic data: concepts, methods, and prospects for future development. Trends Genet. 2012 Jul;28(7):323–32. - PMC - PubMed

Publication types

LinkOut - more resources