Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Feb 24;3(Suppl 1 HISA Big Data in Biomedicine and Healthcare 2013 Con):S3.
doi: 10.1186/2047-2501-3-S1-S3. eCollection 2015.

High performance computing enabling exhaustive analysis of higher order single nucleotide polymorphism interaction in Genome Wide Association Studies

Affiliations

High performance computing enabling exhaustive analysis of higher order single nucleotide polymorphism interaction in Genome Wide Association Studies

Benjamin Goudey et al. Health Inf Sci Syst. .

Abstract

Genome-wide association studies (GWAS) are a common approach for systematic discovery of single nucleotide polymorphisms (SNPs) which are associated with a given disease. Univariate analysis approaches commonly employed may miss important SNP associations that only appear through multivariate analysis in complex diseases. However, multivariate SNP analysis is currently limited by its inherent computational complexity. In this work, we present a computational framework that harnesses supercomputers. Based on our results, we estimate a three-way interaction analysis on 1.1 million SNP GWAS data requiring over 5.8 years on the full "Avoca" IBM Blue Gene/Q installation at the Victorian Life Sciences Computation Initiative. This is hundreds of times faster than estimates for other CPU based methods and four times faster than runtimes estimated for GPU methods, indicating how the improvement in the level of hardware applied to interaction analysis may alter the types of analysis that can be performed. Furthermore, the same analysis would take under 3 months on the currently largest IBM Blue Gene/Q supercomputer "Sequoia" at the Lawrence Livermore National Laboratory assuming linear scaling is maintained as our results suggest. Given that the implementation used in this study can be further optimised, this runtime means it is becoming feasible to carry out exhaustive analysis of higher order interaction studies on large modern GWAS.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Binary genotype representation. Example showing a) the conversion of a given SNP into the binary representation, b) computing the occurrence of a single genotype combination for a pairs of SNPs by taking their logical AND and counting the number of set bits in the resulting binary vector.
Figure 2
Figure 2
Illustration of data decomposition and load balancing. Decomposition strategy. For any given SNP interaction study, the entire calculation is divided into equal-sized partitions. For each partition one MPI task is executed on an assigned CPU card. The further decomposition into small sub-tasks are handle by OpenMP dynamic scheduler.
Figure 3
Figure 3
Run time, scaling and efficiency analysis for strong scaling simulations. Total run times, scaling and efficiency (a., b. and c. respectively) as the number of hardware threads is increased for a 1.1 million SNP, 2000 sample dataset.
Figure 4
Figure 4
Run times and scaling for varying size GWAS datasets on 64 and 1024 computing nodes. Runtime and scaling (a. and b. respectively) as the number of SNPs increases, using either 1024 or 64 nodes respectively. Subsets of the simulated data at increasing powers of 10 (103 - 106) are used. The scaling factor in subplot b. indicate the decrease in runtime as the number of pairs is reduced by a factor of 100.

References

    1. Visscher PM, Brown Ma, McCarthy MI, Yang J. Five years of GWAS discovery. American Journal of Human Genetics. 2012;90(7):24. http://www.ncbi.nlm.nih.gov/pubmed/22243964 - PMC - PubMed
    1. Cantor RM, Lange K, Sinsheimer JS. Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their Application. American Journal of Human Genetics. 2010;86:6–22. doi: 10.1016/j.ajhg.2009.11.017. http://www.ncbi.nlm.nih.gov/pubmed/20074509 - DOI - PMC - PubMed
    1. Zuk O, Hechter E, Sunyaev SR, Lander ES. The mystery of missing heritability: Genetic interactions create phantom heritability. Proceedings of the National Academy of Sciences. 2012;109:1193–1198. doi: 10.1073/pnas.1119675109. http://www.ncbi.nlm.nih.gov/pubmed/22223662 - DOI - PMC - PubMed
    1. Culverhouse R, Suarez BK, Lin J, Reich T. A perspective on epistasis: limits of models displaying no main effect. American Journal of Human Genetics. 2002;70:461–471. doi: 10.1086/338759. http://www.ncbi.nlm.nih.gov/pubmed/11791213 - DOI - PMC - PubMed
    1. Gilbert-Diamond D, Moore JH. Analysis of gene-gene interactions. Current Protocols in Human Genetics. 2011;Chapter 1(July):Unit1.14. http://www.ncbi.nlm.nih.gov/pubmed/21735376 - PMC - PubMed

LinkOut - more resources