Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jun 18;19(1):231.
doi: 10.1186/s12859-018-2229-8.

Performance of epistasis detection methods in semi-simulated GWAS

Affiliations

Performance of epistasis detection methods in semi-simulated GWAS

Clément Chatelain et al. BMC Bioinformatics. .

Abstract

Background: Part of the missing heritability in Genome Wide Association Studies (GWAS) is expected to be explained by interactions between genetic variants, also called epistasis. Various statistical methods have been developed to detect epistasis in case-control GWAS. These methods face major statistical challenges due to the number of tests required, the complexity of the Linkage Disequilibrium (LD) structure, and the lack of consensus regarding the definition of epistasis. Their limited impact in terms of uncovering new biological knowledge might be explained in part by the limited amount of experimental data available to validate their statistical performances in a realistic GWAS context. In this paper, we introduce a simulation pipeline for generating real scale GWAS data, including epistasis and realistic LD structure. We evaluate five exhaustive bivariate interaction methods, fastepi, GBOOST, SHEsisEpi, DSS, and IndOR. Two hundred thirty four different disease scenarios are considered in extensive simulations. We report the performances of each method in terms of false positive rate control, power, area under the ROC curve (AUC), and computation time using a GPU. Finally we compare the result of each methods on a real GWAS of type 2 diabetes from the Welcome Trust Case Control Consortium.

Results: GBOOST, SHEsisEpi and DSS allow a satisfactory control of the false positive rate. fastepi and IndOR present an increase in false positive rate in presence of LD between causal SNPs, with our definition of epistasis. DSS performs best in terms of power and AUC in most scenarios with no or weak LD between causal SNPs. All methods can exhaustively analyze a GWAS with 6.105 SNPs and 15,000 samples in a couple of hours using a GPU.

Conclusion: This study confirms that computation time is no longer a limiting factor for performing an exhaustive search of epistasis in large GWAS. For this task, using DSS on SNP pairs with limited LD seems to be a good strategy to achieve the best statistical performance. A combination approach using both DSS and GBOOST is supported by the simulation results and the analysis of the WTCCC dataset demonstrated that this approach can detect distinct genes in epistasis. Finally, weak epistasis between common variants will be detectable with existing methods when GWAS of a few tens of thousands cases and controls are available.

Keywords: Epistasis; Genome-wide association studies; Simulation.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Score distribution in M0 simulations (no epistasis). Survival function of the scores (SF(score)=P(S>score)) output by each method for various M0 models (continuous line) and theoretical survival functions under H0 (dashed line). For IndOR, fastepi and SHEsisEpi only the top 104 pairs are observed and the survival function therefore reach a plateau for P>104/npairs=1.19×10−6. For GBOOST a score threshold was set at 30 and therefore reach a plateau for lower scores. For DSS a score −log10(fltDSS) is returned by the software only for pairs passing a prefilter test as defined in [14], thus overestimating the p-value for small scores
Fig. 2
Fig. 2
False positive rate in presence of epistasis without marginal effect. False Positive Rate with a p-value treshold after Bonferroni correction 0.05/npairs=5.99×10−12 (dashed line). Model 1 with no marginal effect (ra=rb=1.0)
Fig. 3
Fig. 3
False positive rate at SNP and block level in absence of epistasis and with marginal effect
Fig. 4
Fig. 4
Methods relative power. Power of each method (rows) normalized by the highest power for each scenario (columns). scenarios in which no method achieved a power larger than 0.01 were excluded. Panels represent results for LD parameter r2=0, r2=0.2, and r2=0.5 from left to right
Fig. 5
Fig. 5
Canonical correlation analysis of methods power and disease parameter. Two first components of the canonical correlation analysis between the power of each method in all scenarios and the scenarios parameters: ρ, r2, f, ra and rb
Fig. 6
Fig. 6
Power at block and SNP level. Each point represents for a given method and scenario the True Positive Rate (TP) at SNP level (x-axis) vs power at block level (y-axis). Panels represent results for LD parameter r2=0, r2=0.2, and r2=0.5 from left to right. The estimated power of DSS, GBOOST and IndOR for r2=0.5 is null at SNP and block level for all scenario and are not represented in the right panel
Fig. 7
Fig. 7
Influence of MAF on the smallest epistasis effect detectable. Smallest epistasis effect ρ detectable with a power 0.8 for each method depending on the MAF of causal SNPs (fa=fb=f). n0=n1=1000, and βaβb=1.0 (no main effect)
Fig. 8
Fig. 8
Area under the ROC curve. The global performance of each method is evaluated through the Area Under its ROC Curve (AUC). A piecewise linear approximation of the ROC curve is used to compute its AUC. Random classifiers area caracterized by AUC=0.5 and perfect classifiers by AUC=1. The AUC is represented for each method (rows) and each scenarios (columns), classified by their LD
Fig. 9
Fig. 9
MAFs of SNPs detected in epistasis in the WTCCC GWAS on T2D. For each method MAF distribution of the SNPs in a pair detected in epistasis
Fig. 10
Fig. 10
LD of SNP pairs detected in epistasis in the WTCCC GWAS on T2D. For each method LD distribution of the SNP pairs detected in epistasis
Fig. 11
Fig. 11
Venn diagramm of the genes detected by each method (the number of genes that were detect in univariate analysis is indicated in parenthesis)

Similar articles

Cited by

References

    1. Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS discovery. Am J Hum Genet. 2012;90(1):7–24. doi: 10.1016/j.ajhg.2011.11.029. - DOI - PMC - PubMed
    1. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, Cho JH, Guttmacher AE, Kong A, Kruglyak L, Mardis E, Rotimi CN, Slatkin M, Valle D, Whittemore AS, Boehnke M, Clark AG, Eichler EE, Gibson G, Haines JL, Mackay TFC, McCarroll SA, Visscher PM. Finding the missing heritability of complex diseases. Nature. 2009;461(7265):747–53. doi: 10.1038/nature08494. - DOI - PMC - PubMed
    1. Maher B. Personal genomes: The case of the missing heritability. Nature. 2008;456(7218):18–21. doi: 10.1038/456018a. - DOI - PubMed
    1. Polderman TJC, Benyamin B, de Leeuw CA, Sullivan PF, van Bochoven A, Visscher PM, Posthuma D. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nat Genet. 2015;47(7):702–9. doi: 10.1038/ng.3285. - DOI - PubMed
    1. de los Campos G, Sorensen D, Gianola D. Genomic Heritability: What Is It? PLoS Genet. 2015;11(5):1–21. doi: 10.1371/journal.pgen.1005048. - DOI - PMC - PubMed

Publication types