This is a preprint.
Accelerating Genome- and Phenome-Wide Association Studies using GPUs - A case study using data from the Million Veteran Program
- PMID: 38826407
- PMCID: PMC11142062
- DOI: 10.1101/2024.05.17.594583
Accelerating Genome- and Phenome-Wide Association Studies using GPUs - A case study using data from the Million Veteran Program
Abstract
The expansion of biobanks has significantly propelled genomic discoveries yet the sheer scale of data within these repositories poses formidable computational hurdles, particularly in handling extensive matrix operations required by prevailing statistical frameworks. In this work, we introduce computational optimizations to the SAIGE (Scalable and Accurate Implementation of Generalized Mixed Model) algorithm, notably employing a GPU-based distributed computing approach to tackle these challenges. We applied these optimizations to conduct a large-scale genome-wide association study (GWAS) across 2,068 phenotypes derived from electronic health records of 635,969 diverse participants from the Veterans Affairs (VA) Million Veteran Program (MVP). Our strategies enabled scaling up the analysis to over 6,000 nodes on the Department of Energy (DOE) Oak Ridge Leadership Computing Facility (OLCF) Summit High-Performance Computer (HPC), resulting in a 20-fold acceleration compared to the baseline model. We also provide a Docker container with our optimizations that was successfully used on multiple cloud infrastructures on UK Biobank and All of Us datasets where we showed significant time and cost benefits over the baseline SAIGE model.
Figures
References
-
- Verma A., Damrauer S. M., Naseer N., Weaver J., Kripke C. M., Guare L., Sirugo G., Kember R. L., Drivas T. G., Dudek S. M., Bradford Y., Lucas A., Judy R., Verma S. S., Meagher E., Nathanson K. L., Feldman M., Ritchie M. D., Rader D. J., For The Penn Medicine BioBank, The Penn Medicine BioBank: Towards a Genomics-Enabled Learning Healthcare System to Accelerate Precision Medicine in a Diverse Population. J. Pers. Med. 12, 1974 (2022). - PMC - PubMed
-
- Sudlow C., Gallacher J., Allen N., Beral V., Burton P., Danesh J., Downey P., Elliott P., Green J., Landray M., Liu B., Matthews P., Ong G., Pell J., Silman A., Young A., Sprosen T., Peakman T., Collins R., UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015). - PMC - PubMed
-
- Zawistowski M., Fritsche L. G., Pandit A., Vanderwerff B., Patil S., Schmidt E. M., VandeHaar P., Willer C. J., Brummett C. M., Kheterpal S., Zhou X., Boehnke M., Abecasis G. R., Zöllner S., The Michigan Genomics Initiative: A biobank linking genotypes and electronic clinical records in Michigan Medicine patients. Cell Genomics. 3, 100257 (2023). - PMC - PubMed
-
- Kurki M. I., Karjalainen J., Palta P., Sipilä T. P., Kristiansson K., Donner K. M., Reeve M. P., Laivuori H., Aavikko M., Kaunisto M. A., Loukola A., Lahtela E., Mattsson H., Laiho P., Della Briotta Parolo P., Lehisto A. A., Kanai M., Mars N., Rämö J., Kiiskinen T., Heyne H. O., Veerapen K., Rüeger S., Lemmelä S., Zhou W., Ruotsalainen S., Pärn K., Hiekkalinna T., Koskelainen S., Paajanen T., Llorens V., Gracia-Tabuenca J., Siirtola H., Reis K., Elnahas A. G., Sun B., Foley C. N., Aalto-Setälä K., Alasoo K., Arvas M., Auro K., Biswas S., Bizaki-Vallaskangas A., Carpen O., Chen C.-Y., Dada O. A., Ding Z., Ehm M. G., Eklund K., Färkkilä M., Finucane H., Ganna A., Ghazal A., Graham R. R., Green E. M., Hakanen A., Hautalahti M., Hedman Å. K., Hiltunen M., Hinttala R., Hovatta I., Hu X., Huertas-Vazquez A., Huilaja L., Hunkapiller J., Jacob H., Jensen J.-N., Joensuu H., John S., Julkunen V., Jung M., Junttila J., Kaarniranta K., Kähönen M., Kajanne R., Kallio L., Kälviäinen R., Kaprio J., FinnGen, Kerimov N., Kettunen J., Kilpeläinen E., Kilpi T., Klinger K., Kosma V.-M., Kuopio T., Kurra V., Laisk T., Laukkanen J., Lawless N., Liu A., Longerich S., Mägi R., Mäkelä J., Mäkitie A., Malarstig A., Mannermaa A., Maranville J., Matakidou A., Meretoja T., Mozaffari S. V., Niemi M. E. K., Niemi M., Niiranen T., ÓDonnell C. J., Obeidat M., Okafo G., Ollila H. M., Palomäki A., Palotie T., Partanen J., Paul D. S., Pelkonen M., Pendergrass R. K., Petrovski S., Pitkäranta A., Platt A., Pulford D., Punkka E., Pussinen P., Raghavan N., Rahimov F., Rajpal D., Renaud N. A., Riley-Gillis B., Rodosthenous R., Saarentaus E., Salminen A., Salminen E., Salomaa V., Schleutker J., Serpi R., Shen H., Siegel R., Silander K., Siltanen S., Soini S., Soininen H., Sul J. H., Tachmazidou I., Tasanen K., Tienari P., Toppila-Salmi S., Tukiainen T., Tuomi T., Turunen J. A., Ulirsch J. C., Vaura F., Virolainen P., Waring J., Waterworth D., Yang R., Nelis M., Reigo A., Metspalu A., Milani L., Esko T., Fox C., Havulinna A. S., Perola M., Ripatti S., Jalanko A., Laitinen T., Mäkelä T. P., Plenge R., McCarthy M., Runz H., Daly M. J., Palotie A., FinnGen provides genetic insights from a well-phenotyped isolated population. Nature. 613, 508–518 (2023). - PMC - PubMed
Publication types
Grants and funding
- I01 BX004821/BX/BLRD VA/United States
- P30 AR072577/AR/NIAMS NIH HHS/United States
- I01 CX001849/CX/CSRD VA/United States
- IK2 CX001780/CX/CSRD VA/United States
- R01 AG067025/AG/NIA NIH HHS/United States
- I01 CX001737/CX/CSRD VA/United States
- K99 HG012222/HG/NHGRI NIH HHS/United States
- R01 AG065582/AG/NIA NIH HHS/United States
- I01 BX005831/BX/BLRD VA/United States
- R01 GM138597/GM/NIGMS NIH HHS/United States
- K08 MH122911/MH/NIMH NIH HHS/United States
- UM1 DK126194/DK/NIDDK NIH HHS/United States
- T32 AA028259/AA/NIAAA NIH HHS/United States
- R01 LM010685/LM/NLM NIH HHS/United States
- I01 BX004189/BX/BLRD VA/United States
LinkOut - more resources
Full Text Sources
Miscellaneous