Optimized permutation testing for information theoretic measures of multi-gene interactions
- PMID: 33827420
- PMCID: PMC8028212
- DOI: 10.1186/s12859-021-04107-6
Optimized permutation testing for information theoretic measures of multi-gene interactions
Abstract
Background: Permutation testing is often considered the "gold standard" for multi-test significance analysis, as it is an exact test requiring few assumptions about the distribution being computed. However, it can be computationally very expensive, particularly in its naive form in which the full analysis pipeline is re-run after permuting the phenotype labels. This can become intractable in multi-locus genome-wide association studies (GWAS), in which the number of potential interactions to be tested is combinatorially large.
Results: In this paper, we develop an approach for permutation testing in multi-locus GWAS, specifically focusing on SNP-SNP-phenotype interactions using multivariable measures that can be computed from frequency count tables, such as those based in Information Theory. We find that the computational bottleneck in this process is the construction of the count tables themselves, and that this step can be eliminated at each iteration of the permutation testing by transforming the count tables directly. This leads to a speed-up by a factor of over 103 for a typical permutation test compared to the naive approach. Additionally, this approach is insensitive to the number of samples making it suitable for datasets with large number of samples.
Conclusions: The proliferation of large-scale datasets with genotype data for hundreds of thousands of individuals enables new and more powerful approaches for the detection of multi-locus genotype-phenotype interactions. Our approach significantly improves the computational tractability of permutation testing for these studies. Moreover, our approach is insensitive to the large number of samples in these modern datasets. The code for performing these computations and replicating the figures in this paper is freely available at https://github.com/kunert/permute-counts .
Keywords: Information theory; Multi-locus GWAS; Multivariable interactions; Permutation testing.
Conflict of interest statement
The authors declare that they have no competing interests.
Figures




Similar articles
-
PERMORY: an LD-exploiting permutation test algorithm for powerful genome-wide association testing.Bioinformatics. 2010 Sep 1;26(17):2093-100. doi: 10.1093/bioinformatics/btq399. Epub 2010 Jul 6. Bioinformatics. 2010. PMID: 20605926
-
Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering.BMC Bioinformatics. 2014 Apr 10;15:102. doi: 10.1186/1471-2105-15-102. BMC Bioinformatics. 2014. PMID: 24717145 Free PMC article.
-
Prioritizing genetic variants in GWAS with lasso using permutation-assisted tuning.Bioinformatics. 2020 Jun 1;36(12):3811-3817. doi: 10.1093/bioinformatics/btaa229. Bioinformatics. 2020. PMID: 32246825 Free PMC article.
-
A new permutation strategy of pathway-based approach for genome-wide association study.BMC Bioinformatics. 2009 Dec 18;10:429. doi: 10.1186/1471-2105-10-429. BMC Bioinformatics. 2009. PMID: 20021635 Free PMC article.
-
Integrate multiple traits to detect novel trait-gene association using GWAS summary data with an adaptive test approach.Bioinformatics. 2019 Jul 1;35(13):2251-2257. doi: 10.1093/bioinformatics/bty961. Bioinformatics. 2019. PMID: 30476000 Free PMC article.
Cited by
-
Functional analysis of G6PD variants associated with low G6PD activity in the All of Us Research Program.Genetics. 2024 Nov 28;228(4):iyae170. doi: 10.1093/genetics/iyae170. Online ahead of print. Genetics. 2024. PMID: 39607789 Free PMC article.
-
Functional Analysis of G6PD Variants Associated With Low G6PD Activity in the All of Us Research Program.medRxiv [Preprint]. 2024 Apr 14:2024.04.12.24305393. doi: 10.1101/2024.04.12.24305393. medRxiv. 2024. Update in: Genetics. 2024 Nov 28:iyae170. doi: 10.1093/genetics/iyae170. PMID: 38645242 Free PMC article. Updated. Preprint.
-
Dissecting the contribution of single nucleotide polymorphisms in CCR9 and CCL25 genomic regions to the celiac disease phenotype.J Transl Autoimmun. 2021 Oct 14;4:100128. doi: 10.1016/j.jtauto.2021.100128. eCollection 2021. J Transl Autoimmun. 2021. PMID: 34901814 Free PMC article.
References
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources