Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jul 22:15:622.
doi: 10.1186/1471-2164-15-622.

Systematic permutation testing in GWAS pathway analyses: identification of genetic networks in dilated cardiomyopathy and ulcerative colitis

Affiliations

Systematic permutation testing in GWAS pathway analyses: identification of genetic networks in dilated cardiomyopathy and ulcerative colitis

Christina Backes et al. BMC Genomics. .

Abstract

Background: Genome wide association studies (GWAS) are applied to identify genetic loci, which are associated with complex traits and human diseases. Analogous to the evolution of gene expression analyses, pathway analyses have emerged as important tools to uncover functional networks of genome-wide association data. Usually, pathway analyses combine statistical methods with a priori available biological knowledge. To determine significance thresholds for associated pathways, correction for multiple testing and over-representation permutation testing is applied.

Results: We systematically investigated the impact of three different permutation test approaches for over-representation analysis to detect false positive pathway candidates and evaluate them on genome-wide association data of Dilated Cardiomyopathy (DCM) and Ulcerative Colitis (UC). Our results provide evidence that the gold standard - permuting the case-control status - effectively improves specificity of GWAS pathway analysis. Although permutation of SNPs does not maintain linkage disequilibrium (LD), these permutations represent an alternative for GWAS data when case-control permutations are not possible. Gene permutations, however, did not add significantly to the specificity. Finally, we provide estimates on the required number of permutations for the investigated approaches.

Conclusions: To discover potential false positive functional pathway candidates and to support the results from standard statistical tests such as the Hypergeometric test, permutation tests of case control data should be carried out. The most reasonable alternative was case-control permutation, if this is not possible, SNP permutations may be carried out. Our study also demonstrates that significance values converge rapidly with an increasing number of permutations. By applying the described statistical framework we were able to discover axon guidance, focal adhesion and calcium signaling as important DCM-related pathways and Intestinal immune network for IgA production as most significant UC pathway.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The two distributions represent the result of the column and row I permutation test approach. The original data set revealed a total of 6,226 significantly associated genes (dashed line). Following permutations of the case–control status (red), a significantly decreased number of genes is discovered to be significant. Following the SNP permutations (row permutations I), a significantly increased number of genes was discovered to be significant. The second row based permutation strategy preserved the number of genes (6,226). The respective gene sets have been used as input for the pathway analysis.
Figure 2
Figure 2
Venn diagram showing the overlap between the three different approaches.
Figure 3
Figure 3
Overview on the 20 significant pathways across all approaches (Figure 3A), in both permutation tests (Figure 3B) and just in original calculations (Figure 3C). The figure presents the significance values for the 20 pathways (ordered clockwise according to decreasing significance as calculated by the Hypergeometric test), showing p-values < 0.05 for all three approaches. The further away from the middle the higher the significance scores (on a logarithmic scale). The grey shaded area in the middle corresponds to non-significant pathways. Significance values have been cut at 10-5.
Figure 4
Figure 4
Difference between row- and column permutations. The histograms in panel A and B show for two pathways the significance values as calculated for row and column permutations, respectively. Panels C and D present the respective pathways as provided by KEGG. Here, red marked genes correspond to significant genes in our GWAS.
Figure 5
Figure 5
Comparison between enriched and depleted pathways. Each dot corresponds to one pathway. Red dots correspond to depleted and green dots to enriched pathways.
Figure 6
Figure 6
Influence of the number of permutations. The upper panel of the figure shows for column (red) and row (green) permutation tests the average significance value and the standard deviation for “Pathways in cancer”. The lower panel shows the coefficient of variation (CV) for both approaches.

References

    1. Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST, Bracken MB, Ferris FL, Ott J, Barnstable C, Hoh J. Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308(5720):385–389. doi: 10.1126/science.1109557. - DOI - PMC - PubMed
    1. Haines JL, Hauser MA, Schmidt S, Scott WK, Olson LM, Gallins P, Spencer KL, Kwan SY, Noureddine M, Gilbert JR, Schnetz-Boutaud N, Agarwal A, Postel EA, Pericak-Vance MA. Complement factor H variant increases the risk of age-related macular degeneration. Science. 2005;308(5720):419–421. doi: 10.1126/science.1110359. - DOI - PubMed
    1. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A. 2009;106(23):9362–9367. doi: 10.1073/pnas.0903103106. - DOI - PMC - PubMed
    1. Teslovich TM, Musunuru K, Smith AV, Edmondson AC, Stylianou IM, Koseki M, Pirruccello JP, Ripatti S, Chasman DI, Willer CJ, Johansen CT, Fouchier SW, Isaacs A, Peloso GM, Barbalic M, Ricketts SL, Bis JC, Aulchenko YS, Thorleifsson G, Feitosa MF, Chambers J, Orho-Melander M, Melander O, Johnson T, Li X, Guo X, Li M, Shin Cho Y, Jin Go M, Jin Kim Y. Biological, clinical and population relevance of 95 loci for blood lipids. Nature. 2010;466(7307):707–713. doi: 10.1038/nature09270. - DOI - PMC - PubMed
    1. Holmans P, Green EK, Pahwa JS, Ferreira MA, Purcell SM, Sklar P, Owen MJ, O’Donovan MC, Craddock N. Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder. Am J Hum Genet. 2009;85(1):13–24. doi: 10.1016/j.ajhg.2009.05.011. - DOI - PMC - PubMed

Publication types

MeSH terms