Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 May 22:2024.05.17.594583.
doi: 10.1101/2024.05.17.594583.

Accelerating Genome- and Phenome-Wide Association Studies using GPUs - A case study using data from the Million Veteran Program

Affiliations

Accelerating Genome- and Phenome-Wide Association Studies using GPUs - A case study using data from the Million Veteran Program

Alex Rodriguez et al. bioRxiv. .

Abstract

The expansion of biobanks has significantly propelled genomic discoveries yet the sheer scale of data within these repositories poses formidable computational hurdles, particularly in handling extensive matrix operations required by prevailing statistical frameworks. In this work, we introduce computational optimizations to the SAIGE (Scalable and Accurate Implementation of Generalized Mixed Model) algorithm, notably employing a GPU-based distributed computing approach to tackle these challenges. We applied these optimizations to conduct a large-scale genome-wide association study (GWAS) across 2,068 phenotypes derived from electronic health records of 635,969 diverse participants from the Veterans Affairs (VA) Million Veteran Program (MVP). Our strategies enabled scaling up the analysis to over 6,000 nodes on the Department of Energy (DOE) Oak Ridge Leadership Computing Facility (OLCF) Summit High-Performance Computer (HPC), resulting in a 20-fold acceleration compared to the baseline model. We also provide a Docker container with our optimizations that was successfully used on multiple cloud infrastructures on UK Biobank and All of Us datasets where we showed significant time and cost benefits over the baseline SAIGE model.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Overview of genomic analysis in multiple population groups. a) Schematic representation illustrating the diverse set of GIA population groups. The analysis covers a deep catalog of traits extracted from electronic health records, clinical laboratory tests, vital signs, and survey responses. b) Chart categorizing traits into binary or quantitative types across different population groups. The height of each bar corresponds to the number of traits in each category, providing an overview of the trait composition for subsequent genomic analyses.
Fig. 2
Fig. 2
Distributed BLAS gemv(), matrix-vector multiplication, using GPUs on the cluster. The columns of matrix A are distributed and preloaded on GPUs, with node i having columns with indices from si to ei, and these columns are distributed on GPUs on that node. To compute p=AATv, we first broadcast pv to GPUs, and each node computes a partial solution on GPUs. These partial solutions are aggregated to compute a solution p.
Fig. 3
Fig. 3
Comparative Performance of GPU and CPU Implementations in SAIGE Step one - This figure compares the execution time for each iteration of matrix operations in SAIGE Step one for the European population group. a) Demonstration of the time required for a single PCG iteration on a GPU, showcasing the efficient parallelization within the GPU. b) Contrast with the OpenMP implementation on CPUs, emphasizing the significant speed improvement achieved with GPU acceleration. As the genotype matrix size increases, the advantage of using the GPU version becomes more pronounced, as highlighted by the diminishing execution time on the GPU compared to the CPU.
Fig. 4
Fig. 4
GPU Node Requirements and Memory Impact – GPU node requirement highlight the linear relationship between genotype matrix size and the required number of nodes, offering insights into efficient GPU utilization. The GPU node requirement factored in the GPU memory, the byte size of a single precision floating-point number, and the conversion between bytes and gigabytes. A) Impact of changing the memory available in the GPU. B) Impact of changing number of genotype variants in the input matrix and fixing the GPU memory to 16 gigabytes per GPU, emphasizing considerations for diverse biobank cohorts and computational environments.
Fig. 5
Fig. 5
SAIGE step one run time for All of Us data. The figure shows the time comparison of running SAIGE step one for the T2D phenotype on the Google Cloud Platform for the 5 population groups (EUR, AFR, AMR, EAS, SAS). The analysis was executed on 4 NVIDIA T4 GPUs for the SAIGE-GPU version and a 64-CPU VM for the SAIGE-CPU version.

References

    1. Wolford B. N., Willer C. J., Surakka I., Electronic health records: the next wave of complex disease genetics. Hum. Mol. Genet. 27, R14–R21 (2018). - PMC - PubMed
    1. Verma A., Damrauer S. M., Naseer N., Weaver J., Kripke C. M., Guare L., Sirugo G., Kember R. L., Drivas T. G., Dudek S. M., Bradford Y., Lucas A., Judy R., Verma S. S., Meagher E., Nathanson K. L., Feldman M., Ritchie M. D., Rader D. J., For The Penn Medicine BioBank, The Penn Medicine BioBank: Towards a Genomics-Enabled Learning Healthcare System to Accelerate Precision Medicine in a Diverse Population. J. Pers. Med. 12, 1974 (2022). - PMC - PubMed
    1. Sudlow C., Gallacher J., Allen N., Beral V., Burton P., Danesh J., Downey P., Elliott P., Green J., Landray M., Liu B., Matthews P., Ong G., Pell J., Silman A., Young A., Sprosen T., Peakman T., Collins R., UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015). - PMC - PubMed
    1. Zawistowski M., Fritsche L. G., Pandit A., Vanderwerff B., Patil S., Schmidt E. M., VandeHaar P., Willer C. J., Brummett C. M., Kheterpal S., Zhou X., Boehnke M., Abecasis G. R., Zöllner S., The Michigan Genomics Initiative: A biobank linking genotypes and electronic clinical records in Michigan Medicine patients. Cell Genomics. 3, 100257 (2023). - PMC - PubMed
    1. Kurki M. I., Karjalainen J., Palta P., Sipilä T. P., Kristiansson K., Donner K. M., Reeve M. P., Laivuori H., Aavikko M., Kaunisto M. A., Loukola A., Lahtela E., Mattsson H., Laiho P., Della Briotta Parolo P., Lehisto A. A., Kanai M., Mars N., Rämö J., Kiiskinen T., Heyne H. O., Veerapen K., Rüeger S., Lemmelä S., Zhou W., Ruotsalainen S., Pärn K., Hiekkalinna T., Koskelainen S., Paajanen T., Llorens V., Gracia-Tabuenca J., Siirtola H., Reis K., Elnahas A. G., Sun B., Foley C. N., Aalto-Setälä K., Alasoo K., Arvas M., Auro K., Biswas S., Bizaki-Vallaskangas A., Carpen O., Chen C.-Y., Dada O. A., Ding Z., Ehm M. G., Eklund K., Färkkilä M., Finucane H., Ganna A., Ghazal A., Graham R. R., Green E. M., Hakanen A., Hautalahti M., Hedman Å. K., Hiltunen M., Hinttala R., Hovatta I., Hu X., Huertas-Vazquez A., Huilaja L., Hunkapiller J., Jacob H., Jensen J.-N., Joensuu H., John S., Julkunen V., Jung M., Junttila J., Kaarniranta K., Kähönen M., Kajanne R., Kallio L., Kälviäinen R., Kaprio J., FinnGen, Kerimov N., Kettunen J., Kilpeläinen E., Kilpi T., Klinger K., Kosma V.-M., Kuopio T., Kurra V., Laisk T., Laukkanen J., Lawless N., Liu A., Longerich S., Mägi R., Mäkelä J., Mäkitie A., Malarstig A., Mannermaa A., Maranville J., Matakidou A., Meretoja T., Mozaffari S. V., Niemi M. E. K., Niemi M., Niiranen T., ÓDonnell C. J., Obeidat M., Okafo G., Ollila H. M., Palomäki A., Palotie T., Partanen J., Paul D. S., Pelkonen M., Pendergrass R. K., Petrovski S., Pitkäranta A., Platt A., Pulford D., Punkka E., Pussinen P., Raghavan N., Rahimov F., Rajpal D., Renaud N. A., Riley-Gillis B., Rodosthenous R., Saarentaus E., Salminen A., Salminen E., Salomaa V., Schleutker J., Serpi R., Shen H., Siegel R., Silander K., Siltanen S., Soini S., Soininen H., Sul J. H., Tachmazidou I., Tasanen K., Tienari P., Toppila-Salmi S., Tukiainen T., Tuomi T., Turunen J. A., Ulirsch J. C., Vaura F., Virolainen P., Waring J., Waterworth D., Yang R., Nelis M., Reigo A., Metspalu A., Milani L., Esko T., Fox C., Havulinna A. S., Perola M., Ripatti S., Jalanko A., Laitinen T., Mäkelä T. P., Plenge R., McCarthy M., Runz H., Daly M. J., Palotie A., FinnGen provides genetic insights from a well-phenotyped isolated population. Nature. 613, 508–518 (2023). - PMC - PubMed

Publication types