. 2018 Jun 18;19(1):231.

doi: 10.1186/s12859-018-2229-8.

Performance of epistasis detection methods in semi-simulated GWAS

Clément Chatelain¹, Guillermo Durand², Vincent Thuillier³, Franck Augé⁴

Affiliations

¹ SANOFI R&D, Translational Sciences, Chilly Mazarin, 91385, France. clement.chatelain@sanofi.com.
² Laboratoire de Probabilités et Modèles Aléatoires, Université Pierre et Marie Curie, 4, place Jussieu, Paris Cedex 05, 75252, France.
³ SANOFI R&D, Biostatistics & Programming, Chilly Mazarin, 91385, France.
⁴ SANOFI R&D, Translational Sciences, Chilly Mazarin, 91385, France.

PMID: 29914375
PMCID: PMC6006572
DOI: 10.1186/s12859-018-2229-8

Performance of epistasis detection methods in semi-simulated GWAS

Clément Chatelain et al. BMC Bioinformatics. 2018.

. 2018 Jun 18;19(1):231.

doi: 10.1186/s12859-018-2229-8.

Authors

Clément Chatelain¹, Guillermo Durand², Vincent Thuillier³, Franck Augé⁴

Affiliations

¹ SANOFI R&D, Translational Sciences, Chilly Mazarin, 91385, France. clement.chatelain@sanofi.com.
² Laboratoire de Probabilités et Modèles Aléatoires, Université Pierre et Marie Curie, 4, place Jussieu, Paris Cedex 05, 75252, France.
³ SANOFI R&D, Biostatistics & Programming, Chilly Mazarin, 91385, France.
⁴ SANOFI R&D, Translational Sciences, Chilly Mazarin, 91385, France.

PMID: 29914375
PMCID: PMC6006572
DOI: 10.1186/s12859-018-2229-8

Abstract

Background: Part of the missing heritability in Genome Wide Association Studies (GWAS) is expected to be explained by interactions between genetic variants, also called epistasis. Various statistical methods have been developed to detect epistasis in case-control GWAS. These methods face major statistical challenges due to the number of tests required, the complexity of the Linkage Disequilibrium (LD) structure, and the lack of consensus regarding the definition of epistasis. Their limited impact in terms of uncovering new biological knowledge might be explained in part by the limited amount of experimental data available to validate their statistical performances in a realistic GWAS context. In this paper, we introduce a simulation pipeline for generating real scale GWAS data, including epistasis and realistic LD structure. We evaluate five exhaustive bivariate interaction methods, fastepi, GBOOST, SHEsisEpi, DSS, and IndOR. Two hundred thirty four different disease scenarios are considered in extensive simulations. We report the performances of each method in terms of false positive rate control, power, area under the ROC curve (AUC), and computation time using a GPU. Finally we compare the result of each methods on a real GWAS of type 2 diabetes from the Welcome Trust Case Control Consortium.

Results: GBOOST, SHEsisEpi and DSS allow a satisfactory control of the false positive rate. fastepi and IndOR present an increase in false positive rate in presence of LD between causal SNPs, with our definition of epistasis. DSS performs best in terms of power and AUC in most scenarios with no or weak LD between causal SNPs. All methods can exhaustively analyze a GWAS with 6.10⁵ SNPs and 15,000 samples in a couple of hours using a GPU.

Conclusion: This study confirms that computation time is no longer a limiting factor for performing an exhaustive search of epistasis in large GWAS. For this task, using DSS on SNP pairs with limited LD seems to be a good strategy to achieve the best statistical performance. A combination approach using both DSS and GBOOST is supported by the simulation results and the analysis of the WTCCC dataset demonstrated that this approach can detect distinct genes in epistasis. Finally, weak epistasis between common variants will be detectable with existing methods when GWAS of a few tens of thousands cases and controls are available.

Keywords: Epistasis; Genome-wide association studies; Simulation.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

**Fig. 1**
Score distribution in M₀ simulations (no epistasis). Survival function of the scores (SF(score)=P(S>score)) output by each method for various M₀ models (continuous line) and theoretical survival functions under H₀ (dashed line). For IndOR, fastepi and SHEsisEpi only the top 10⁴ pairs are observed and the survival function therefore reach a plateau for P>10⁴/n_pairs=1.19×10⁻⁶. For GBOOST a score threshold was set at 30 and therefore reach a plateau for lower scores. For DSS a score −log₁₀(flt_DSS) is returned by the software only for pairs passing a prefilter test as defined in [14], thus overestimating the p-value for small scores

**Fig. 2**
False positive rate in presence of epistasis without marginal effect. False Positive Rate with a p-value treshold after Bonferroni correction 0.05/n_pairs=5.99×10⁻¹² (dashed line). Model 1 with no marginal effect (r_a=r_b=1.0)

**Fig. 3**
False positive rate at SNP and block level in absence of epistasis and with marginal effect

**Fig. 4**
Methods relative power. Power of each method (rows) normalized by the highest power for each scenario (columns). scenarios in which no method achieved a power larger than 0.01 were excluded. Panels represent results for LD parameter r²=0, r²=0.2, and r²=0.5 from left to right

**Fig. 5**
Canonical correlation analysis of methods power and disease parameter. Two first components of the canonical correlation analysis between the power of each method in all scenarios and the scenarios parameters: ρ, r², f, r_a and r_b

**Fig. 6**
Power at block and SNP level. Each point represents for a given method and scenario the True Positive Rate (TP) at SNP level (x-axis) vs power at block level (y-axis). Panels represent results for LD parameter r²=0, r²=0.2, and r²=0.5 from left to right. The estimated power of DSS, GBOOST and IndOR for r²=0.5 is null at SNP and block level for all scenario and are not represented in the right panel

**Fig. 7**
Influence of MAF on the smallest epistasis effect detectable. Smallest epistasis effect ρ detectable with a power 0.8 for each method depending on the MAF of causal SNPs (f_a=f_b=f). n₀=n₁=1000, and β_aβ_b=1.0 (no main effect)

**Fig. 8**
Area under the ROC curve. The global performance of each method is evaluated through the Area Under its ROC Curve (AUC). A piecewise linear approximation of the ROC curve is used to compute its AUC. Random classifiers area caracterized by AUC=0.5 and perfect classifiers by AUC=1. The AUC is represented for each method (rows) and each scenarios (columns), classified by their LD

**Fig. 9**
MAFs of SNPs detected in epistasis in the WTCCC GWAS on T2D. For each method MAF distribution of the SNPs in a pair detected in epistasis

**Fig. 10**
LD of SNP pairs detected in epistasis in the WTCCC GWAS on T2D. For each method LD distribution of the SNP pairs detected in epistasis

**Fig. 11**
Venn diagramm of the genes detected by each method (the number of genes that were detect in univariate analysis is indicated in parenthesis)

See this image and copyright information in PMC

Cited by

Bench Research Informed by GWAS Results.
Kondratyev NV, Alfimova MV, Golov AK, Golimbet VE. Kondratyev NV, et al. Cells. 2021 Nov 15;10(11):3184. doi: 10.3390/cells10113184. Cells. 2021. PMID: 34831407 Free PMC article. Review.
A new method for exploring gene-gene and gene-environment interactions in GWAS with tree ensemble methods and SHAP values.
Johnsen PV, Riemer-Sørensen S, DeWan AT, Cahill ME, Langaas M. Johnsen PV, et al. BMC Bioinformatics. 2021 May 4;22(1):230. doi: 10.1186/s12859-021-04041-7. BMC Bioinformatics. 2021. PMID: 33947323 Free PMC article.
Evaluation of epistasis detection methods for quantitative phenotypes.
Listopad S, Renjith G, Peng Q. Listopad S, et al. bioRxiv [Preprint]. 2025 May 14:2025.04.30.651312. doi: 10.1101/2025.04.30.651312. bioRxiv. 2025. PMID: 40463086 Free PMC article. Preprint.
Evaluating the detection ability of a range of epistasis detection methods on simulated data for pure and impure epistatic models.
Russ D, Williams JA, Cardoso VR, Bravo-Merodio L, Pendleton SC, Aziz F, Acharjee A, Gkoutos GV. Russ D, et al. PLoS One. 2022 Feb 18;17(2):e0263390. doi: 10.1371/journal.pone.0263390. eCollection 2022. PLoS One. 2022. PMID: 35180244 Free PMC article.
MIDESP: Mutual Information-Based Detection of Epistatic SNP Pairs for Qualitative and Quantitative Phenotypes.
Heinrich F, Ramzan F, Rajavel A, Schmitt AO, Gültas M. Heinrich F, et al. Biology (Basel). 2021 Sep 16;10(9):921. doi: 10.3390/biology10090921. Biology (Basel). 2021. PMID: 34571798 Free PMC article.

See all "Cited by" articles

References

1. Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS discovery. Am J Hum Genet. 2012;90(1):7–24. doi: 10.1016/j.ajhg.2011.11.029. - DOI - PMC - PubMed
1. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, Cho JH, Guttmacher AE, Kong A, Kruglyak L, Mardis E, Rotimi CN, Slatkin M, Valle D, Whittemore AS, Boehnke M, Clark AG, Eichler EE, Gibson G, Haines JL, Mackay TFC, McCarroll SA, Visscher PM. Finding the missing heritability of complex diseases. Nature. 2009;461(7265):747–53. doi: 10.1038/nature08494. - DOI - PMC - PubMed
1. Maher B. Personal genomes: The case of the missing heritability. Nature. 2008;456(7218):18–21. doi: 10.1038/456018a. - DOI - PubMed
1. Polderman TJC, Benyamin B, de Leeuw CA, Sullivan PF, van Bochoven A, Visscher PM, Posthuma D. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nat Genet. 2015;47(7):702–9. doi: 10.1038/ng.3285. - DOI - PubMed
1. de los Campos G, Sorensen D, Gianola D. Genomic Heritability: What Is It? PLoS Genet. 2015;11(5):1–21. doi: 10.1371/journal.pgen.1005048. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed