Benchmarking accelerated next-generation sequencing analysis pipelines

Affiliations

¹ Scientific Computing Services, Division for Research, Dissemination and Education, University of Oslo, Oslo, 0373, Norway.
² Norwegian National Unit for Newborn Screening, Division for Pediatric and Adolescent Medicine, Oslo University Hospital, Oslo, 0450, Norway.
³ Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, 0424, Norway.
⁴ Data Science and Learning, Argonne National Laboratory, University of Chicago, Chicago, IL, 60439, United States.
⁵ Division for Research, Dissemination and Education, University of Oslo, Oslo, 0373, Norway.
⁶ Centre for Computational Inference in Evolutionary Life Science (CELS), Department of Informatics (USIT), University of Oslo, Oslo, 0373, Norway.

PMID: 40395501
PMCID: PMC12092081
DOI: 10.1093/bioadv/vbaf085

Benchmarking accelerated next-generation sequencing analysis pipelines

Pubudu Saneth Samarakoon et al. Bioinform Adv. 2025.

. 2025 May 15;5(1):vbaf085.

doi: 10.1093/bioadv/vbaf085. eCollection 2025.

Authors

Affiliations

¹ Scientific Computing Services, Division for Research, Dissemination and Education, University of Oslo, Oslo, 0373, Norway.
² Norwegian National Unit for Newborn Screening, Division for Pediatric and Adolescent Medicine, Oslo University Hospital, Oslo, 0450, Norway.
³ Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, 0424, Norway.
⁴ Data Science and Learning, Argonne National Laboratory, University of Chicago, Chicago, IL, 60439, United States.
⁵ Division for Research, Dissemination and Education, University of Oslo, Oslo, 0373, Norway.
⁶ Centre for Computational Inference in Evolutionary Life Science (CELS), Department of Informatics (USIT), University of Oslo, Oslo, 0373, Norway.

PMID: 40395501
PMCID: PMC12092081
DOI: 10.1093/bioadv/vbaf085

Abstract

Motivation: Industry-standard central processing unit (CPU)-based next-generation sequencing (NGS) analysis tools have led to longer runtimes, affecting their utility in time-sensitive clinical practices and population-scale research studies. To address this, researchers have developed accelerated NGS platforms like DRAGEN and Parabricks, which have significantly reduced runtimes-from days to hours. However, these studies have evaluated accelerated platforms independently without sufficiently assessing computational resource usage or thoroughly investigating speedup scalability, a gap our study is designed to address.

Results: Corroborating previous studies, accelerated pipelines demonstrated shorter runtimes than CPU-only approaches, with Parabricks-H100 demonstrating the highest speedups, followed by DRAGEN. In mapping, DRAGEN outperformed Parabricks (L4 and A100) and matched H100 speedups. Parabricks (A100 and H100) variant calling demonstrated higher speedups than DRAGEN. Moreover, DRAGEN and Parabricks-H100 mapping showed positive trends in the coverage-based scalability analysis, while other configurations failed to scale effectively. Our profiler analysis provided new insights into the relationships between Parabricks' performances and resource usage patterns, revealing its potential for further improvements. Our findings and cost comparison help researchers select accelerated platforms based on coverage needs, timeframes, and budget, while suggesting optimization strategies.

Availability and implementation: Datasets are described in the 'Data availability' section. Our NGS pipelines are available at https://github.com/NAICNO/accelerated_genomics.

PubMed Disclaimer

Conflict of interest statement

All the authors declare no competing interests.

Figures

**Figure 1.**
Overview of the three NGS pipelines—DRAGEN, CPU-only, and Parabricks. The workflow diagram compares three NGS pipelines, showing the input and output at each processing step.

**Figure 2.**
Runtime and speedup comparison. (A–C) Runtime comparisons (in minutes) across accelerated platforms for (A) read mapping stage, (B) variant calling HC, and (C) variant calling DV processes. Accelerated platforms include Parabricks on L4, A100, and H100 GPUs and DRAGEN. CPU-only pipeline data omitted to highlight runtime differences between accelerated pipelines. (D–F) Speedup factors of accelerated (D) read mapping stage, (E) variant calling HC and (F) variant calling DV, calculated relative to CPU-only baseline (Speedup = baseline runtime/accelerated runtime).

**Figure 3.**
Resource usage patterns across coverage and GPU platforms. (A–C) L4 and H100 GPU (cards) usage comparison between low- and high-coverage samples during (A) read mapping stage, (B) variant calling HC, and (C) variant calling DV processes. (D–F) L4 and H100 GPU memory usage comparison for low- and high-coverage samples during (D) read mapping stage, variant calling (E) HC, and (F) variant calling DV processes.

**Figure 4.**
Disk I/O usage of Parabricks pipeline (on H100 GPUs). (A) Disk I/O rate distribution across three processes of the Parabricks pipeline; (B) same distribution in log10 scale to highlight read/write activity across the Parabricks pipeline.

**Figure 5.**
Performance evaluation and concordance analysis. (A, B) Precision, recall, and F1 scores from performance evaluation analysis of (A) SNVs and (B) indels. Performance evaluation analysis with the hap.py tool used VCF files of accelerated pipelines as query and the gold-standard high-confidence VCF files as the ground truth; (C, D) precision, recall, and F1 scores from concordance analysis of (C) SNVs and (D) indels. Concordance analysis with the hap.py tool used VCF files of accelerated pipelines as query and the CPU-only best-practice output VCF files as the ground truth.

See this image and copyright information in PMC

References

1. Abadi M, Agarwal A, Barham P et al. TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv [cs.DC], 10.48550/arXiv.1603.04467, 16 March 2016, preprint: not peer reviewed. - DOI
1. Ajay SS, Parker SCJ, Abaan HO et al. Accurate and comprehensive sequencing of personal genomes. Genome Res 2011;21:1498–505. 10.1101/gr.123638.111 - DOI - PMC - PubMed
1. Alganmi N, Abusamra H. Evaluation of an optimized germline exomes pipeline using BWA-MEM2 and Dragen-GATK tools. PLoS One 2023;18:e0288371. 10.1371/journal.pone.0288371 - DOI - PMC - PubMed
1. Behera S, Catreux S, Rossi M et al. Comprehensive and accurate genome analysis at scale using DRAGEN accelerated algorithms. bioRxiv, 10.1101/2024.01.02.573821, 6 January 2024, preprint: not peer reviewed. - DOI
1. Betschart RO, Thiéry A, Aguilera-Garcia D et al. Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment. Sci Rep 2022;12:21502. 10.1038/s41598-022-26181-3 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
- PubMed Central
- Silverchair Information Systems
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Benchmarking accelerated next-generation sequencing analysis pipelines

Affiliations

Benchmarking accelerated next-generation sequencing analysis pipelines

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Miscellaneous