Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 15;5(1):vbaf085.
doi: 10.1093/bioadv/vbaf085. eCollection 2025.

Benchmarking accelerated next-generation sequencing analysis pipelines

Affiliations

Benchmarking accelerated next-generation sequencing analysis pipelines

Pubudu Saneth Samarakoon et al. Bioinform Adv. .

Abstract

Motivation: Industry-standard central processing unit (CPU)-based next-generation sequencing (NGS) analysis tools have led to longer runtimes, affecting their utility in time-sensitive clinical practices and population-scale research studies. To address this, researchers have developed accelerated NGS platforms like DRAGEN and Parabricks, which have significantly reduced runtimes-from days to hours. However, these studies have evaluated accelerated platforms independently without sufficiently assessing computational resource usage or thoroughly investigating speedup scalability, a gap our study is designed to address.

Results: Corroborating previous studies, accelerated pipelines demonstrated shorter runtimes than CPU-only approaches, with Parabricks-H100 demonstrating the highest speedups, followed by DRAGEN. In mapping, DRAGEN outperformed Parabricks (L4 and A100) and matched H100 speedups. Parabricks (A100 and H100) variant calling demonstrated higher speedups than DRAGEN. Moreover, DRAGEN and Parabricks-H100 mapping showed positive trends in the coverage-based scalability analysis, while other configurations failed to scale effectively. Our profiler analysis provided new insights into the relationships between Parabricks' performances and resource usage patterns, revealing its potential for further improvements. Our findings and cost comparison help researchers select accelerated platforms based on coverage needs, timeframes, and budget, while suggesting optimization strategies.

Availability and implementation: Datasets are described in the 'Data availability' section. Our NGS pipelines are available at https://github.com/NAICNO/accelerated_genomics.

PubMed Disclaimer

Conflict of interest statement

All the authors declare no competing interests.

Figures

Figure 1.
Figure 1.
Overview of the three NGS pipelines—DRAGEN, CPU-only, and Parabricks. The workflow diagram compares three NGS pipelines, showing the input and output at each processing step.
Figure 2.
Figure 2.
Runtime and speedup comparison. (A–C) Runtime comparisons (in minutes) across accelerated platforms for (A) read mapping stage, (B) variant calling HC, and (C) variant calling DV processes. Accelerated platforms include Parabricks on L4, A100, and H100 GPUs and DRAGEN. CPU-only pipeline data omitted to highlight runtime differences between accelerated pipelines. (D–F) Speedup factors of accelerated (D) read mapping stage, (E) variant calling HC and (F) variant calling DV, calculated relative to CPU-only baseline (Speedup = baseline runtime/accelerated runtime).
Figure 3.
Figure 3.
Resource usage patterns across coverage and GPU platforms. (A–C) L4 and H100 GPU (cards) usage comparison between low- and high-coverage samples during (A) read mapping stage, (B) variant calling HC, and (C) variant calling DV processes. (D–F) L4 and H100 GPU memory usage comparison for low- and high-coverage samples during (D) read mapping stage, variant calling (E) HC, and (F) variant calling DV processes.
Figure 4.
Figure 4.
Disk I/O usage of Parabricks pipeline (on H100 GPUs). (A) Disk I/O rate distribution across three processes of the Parabricks pipeline; (B) same distribution in log10 scale to highlight read/write activity across the Parabricks pipeline.
Figure 5.
Figure 5.
Performance evaluation and concordance analysis. (A, B) Precision, recall, and F1 scores from performance evaluation analysis of (A) SNVs and (B) indels. Performance evaluation analysis with the hap.py tool used VCF files of accelerated pipelines as query and the gold-standard high-confidence VCF files as the ground truth; (C, D) precision, recall, and F1 scores from concordance analysis of (C) SNVs and (D) indels. Concordance analysis with the hap.py tool used VCF files of accelerated pipelines as query and the CPU-only best-practice output VCF files as the ground truth.

References

    1. Abadi M, Agarwal A, Barham P et al. TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv [cs.DC], 10.48550/arXiv.1603.04467, 16 March 2016, preprint: not peer reviewed. - DOI
    1. Ajay SS, Parker SCJ, Abaan HO et al. Accurate and comprehensive sequencing of personal genomes. Genome Res 2011;21:1498–505. 10.1101/gr.123638.111 - DOI - PMC - PubMed
    1. Alganmi N, Abusamra H. Evaluation of an optimized germline exomes pipeline using BWA-MEM2 and Dragen-GATK tools. PLoS One 2023;18:e0288371. 10.1371/journal.pone.0288371 - DOI - PMC - PubMed
    1. Behera S, Catreux S, Rossi M et al. Comprehensive and accurate genome analysis at scale using DRAGEN accelerated algorithms. bioRxiv, 10.1101/2024.01.02.573821, 6 January 2024, preprint: not peer reviewed. - DOI
    1. Betschart RO, Thiéry A, Aguilera-Garcia D et al. Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment. Sci Rep 2022;12:21502. 10.1038/s41598-022-26181-3 - DOI - PMC - PubMed

LinkOut - more resources