Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 19;14(1):16.
doi: 10.1186/s13040-021-00247-w.

Performance of model-based multifactor dimensionality reduction methods for epistasis detection by controlling population structure

Affiliations

Performance of model-based multifactor dimensionality reduction methods for epistasis detection by controlling population structure

Fentaw Abegaz et al. BioData Min. .

Abstract

Background: In genome-wide association studies the extent and impact of confounding due to population structure have been well recognized. Inadequate handling of such confounding is likely to lead to spurious associations, hampering replication, and the identification of causal variants. Several strategies have been developed for protecting associations against confounding, the most popular one is based on Principal Component Analysis. In contrast, the extent and impact of confounding due to population structure in gene-gene interaction association epistasis studies are much less investigated and understood. In particular, the role of nonlinear genetic population substructure in epistasis detection is largely under-investigated, especially outside a regression framework.

Methods: To identify causal variants in synergy, to improve interpretability and replicability of epistasis results, we introduce three strategies based on a model-based multifactor dimensionality reduction approach for structured populations, namely MBMDR-PC, MBMDR-PG, and MBMDR-GC.

Results: Simulation results comparing the performance of various approaches show that in the presence of population structure MBMDR-PC and MBMDR-PG consistently better control type I error rate at the nominal level than MBMDR-GC. Moreover, our proposed three methods of population structure correction outperform MDR-SP in terms of statistical power.

Conclusion: We demonstrate through extensive simulation studies the effect of various degrees of genetic population structure and relatedness on epistasis detection and propose appropriate remedial measures based on linear and nonlinear sample genetic similarity.

Keywords: Confounding; Epistasis; GWAIS; GWAS; Gene-gene interaction; MB-MDR; Population stratification; Population structure; Principal components.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

Fig. 1
Fig. 1
Flow of considered simulation settings
Fig. 2
Fig. 2
Power estimates for MBMDR-PC (blue/solid line), MBMDR-PG (green/dashed line) and MDR-SP (red/dotted line) under the six disease models based on simulated data on CEU and YRI populations with a difference of 0.3 minor allele frequency between the two populations. Percentage of cases and control from the CEU are 40 and 80%, respectively. The power (y-axis) is computed using 10 candidate SNPs. PCs are computed from 200
Fig. 3
Fig. 3
Power estimates according to varying proportions of cases and controls in six disease epistasis models and variable sample sizes (200, 500, 1000). The percentage of cases in one of the two populations are shown
Fig. 4
Fig. 4
Pairwise plots of the first three principal components computed using linear (A1-A3), kernel (B1-B3) and ncMCE (C1-C3) PCA methods
Fig. 5
Fig. 5
Plots of the first two principal components (a) linear PCA and (b) kernel PCA
Fig. 6
Fig. 6
Estimated type I error rates for MBMDR-PC with case-control ratios (a) 60:40 and (b) 80:20. PC approaches considered: linear PCA (blue bars), kernel PCA (green bars)

References

    1. Price AL, Zaitlen NA, Reich D, Patterson N. New approaches to population stratification in genome-wide association studies. Nat Rev Genet. 2010;11:459–463. doi: 10.1038/nrg2813. - DOI - PMC - PubMed
    1. Spielman RS, Ewens WJ. The TDT and other family-based tests for linkage disequilibrium and association. Am J Hum Genet. 1996;59:983–989. - PMC - PubMed
    1. Horvath S, Xu X, Laird NM. The family based association test method: strategies for studying general genotype--phenotype associations. Eur J Hum Genet EJHG. 2001;9:301–306. doi: 10.1038/sj.ejhg.5200625. - DOI - PubMed
    1. Simpson EH. The interpretation of interaction in contingency tables. J R Stat Soc Ser B Methodol. 1951;13:238–241.
    1. Marchini J, Cardon LR, Phillips MS, Donnelly P. The effects of human population structure on large genetic association studies. Nat Genet. 2004;36:512–517. doi: 10.1038/ng1337. - DOI - PubMed

LinkOut - more resources