Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May 31;24(1):221.
doi: 10.1186/s12859-023-05292-2.

Accelerating genomic workflows using NVIDIA Parabricks

Affiliations

Accelerating genomic workflows using NVIDIA Parabricks

Kyle A O'Connell et al. BMC Bioinformatics. .

Abstract

Background: As genome sequencing becomes better integrated into scientific research, government policy, and personalized medicine, the primary challenge for researchers is shifting from generating raw data to analyzing these vast datasets. Although much work has been done to reduce compute times using various configurations of traditional CPU computing infrastructures, Graphics Processing Units (GPUs) offer opportunities to accelerate genomic workflows by orders of magnitude. Here we benchmark one GPU-accelerated software suite called NVIDIA Parabricks on Amazon Web Services (AWS), Google Cloud Platform (GCP), and an NVIDIA DGX cluster. We benchmarked six variant calling pipelines, including two germline callers (HaplotypeCaller and DeepVariant) and four somatic callers (Mutect2, Muse, LoFreq, SomaticSniper).

Results: We achieved up to 65 × acceleration with germline variant callers, bringing HaplotypeCaller runtimes down from 36 h to 33 min on AWS, 35 min on GCP, and 24 min on the NVIDIA DGX. Somatic callers exhibited more variation between the number of GPUs and computing platforms. On cloud platforms, GPU-accelerated germline callers resulted in cost savings compared with CPU runs, whereas some somatic callers were more expensive than CPU runs because their GPU acceleration was not sufficient to overcome the increased GPU cost.

Conclusions: Germline variant callers scaled well with the number of GPUs across platforms, whereas somatic variant callers exhibited more variation in the number of GPUs with the fastest runtimes, suggesting that, at least with the version of Parabricks used here, these workflows are less GPU optimized and require benchmarking on the platform of choice before being deployed at production scales. Our study demonstrates that GPUs can be used to greatly accelerate genomic workflows, thus bringing closer to grasp urgent societal advances in the areas of biosurveillance and personalized medicine.

Keywords: Amazon Web Services; Cloud computing; GPU acceleration; Google Cloud Platform; NVIDIA Parabricks.

PubMed Disclaimer

Conflict of interest statement

Deloitte Consulting LLP is an alliance partner with NVIDIA, Amazon Web Services and Google.

Figures

Fig. 1
Fig. 1
Comparison of execution times of variant calling algorithms on CPU and GPU environments between AWS and GCP. A 32 vCPU machine with the latest processors was used for CPU benchmarking on both cloud platforms. Here we show results for varying numbers of NVIDIA Tesla V100 GPUs running the Parabricks bioinformatics suite for AWS, and NVIDIA Tesla A100 GPUs for GCP
Fig. 2
Fig. 2
GPU benchmarking results for NVIDIA Tesla GPUs. On GCP and the DGX results are shown for A100 GPUs, whereas AWS results are shown for the V100 GPU runs
Fig. 3
Fig. 3
Comparison of AWS (V100 GPU machine) versus GCP GPU cost savings per variant caller. Percentage of total cost savings shows higher cost savings using GPUs in algorithms optimized for GPU-acceleration, but losses when algorithms are not well optimized
Fig. 4
Fig. 4
Comparison of AWS V100 versus GCP A100 GPU cost ratio per variant caller. Cost ratio is the ratio between cost per hour and fold speed-up. Cost per fold-speedup shows the benefit of harnessing GPU over CPU in select algorithms, while other algorithms are more cost-efficient with CPUs when using the version of Parabricks that we benchmarked
Fig. 5
Fig. 5
Comparison of runtimes between V100 and A100 GPU machines on AWS

References

    1. Langmead B, Nellore A. Cloud computing for genomic data analysis and collaboration. Nat Rev Genet. 2018;19(4):208–219. doi: 10.1038/nrg.2017.113. - DOI - PMC - PubMed
    1. Nwadiugwu MC, Monteiro N. Applied genomics for identification of virulent biothreats and for disease outbreak surveillance. Postgrad Med J; 2022. - PubMed
    1. Zhao S, Agafonov O, Azab A, Stokowy T, Hovig E. Accuracy and efficiency of germline variant calling pipelines for human genome data. Sci Rep. 2020;10(1):1–12. doi: 10.1038/s41598-020-77218-4. - DOI - PMC - PubMed
    1. Liu B, et al. Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses. J Biomed Inform. 2014;49:119–133. doi: 10.1016/j.jbi.2014.01.005. - DOI - PMC - PubMed
    1. Cole BS, Moore JH. Eleven quick tips for architecting biomedical informatics workflows with cloud computing. PLoS Comput Biol. 2018;14(3):e1005994. doi: 10.1371/journal.pcbi.1005994. - DOI - PMC - PubMed