Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Aug 20:10:736.
doi: 10.3389/fgene.2019.00736. eCollection 2019.

Sentieon DNASeq Variant Calling Workflow Demonstrates Strong Computational Performance and Accuracy

Affiliations

Sentieon DNASeq Variant Calling Workflow Demonstrates Strong Computational Performance and Accuracy

Katherine I Kendig et al. Front Genet. .

Abstract

As reliable, efficient genome sequencing becomes ubiquitous, the need for similarly reliable and efficient variant calling becomes increasingly important. The Genome Analysis Toolkit (GATK), maintained by the Broad Institute, is currently the widely accepted standard for variant calling software. However, alternative solutions may provide faster variant calling without sacrificing accuracy. One such alternative is Sentieon DNASeq, a toolkit analogous to GATK but built on a highly optimized backend. We conducted an independent evaluation of the DNASeq single-sample variant calling pipeline in comparison to that of GATK. Our results support the near-identical accuracy of the two software packages, showcase optimal scalability and great speed from Sentieon, and describe computational performance considerations for the deployment of DNASeq.

Keywords: DNASeq; GATK; Sentieon; benchmarking; variant calling.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Sentieon DNASeq pipeline: demonstrated scaling across threads on Skylake architecture vs. optimal (linear) scaling. Sample: NA12878, WGS, 20X. Data points reflect averages over two replicates, highlighting (A) post-alignment steps only and (B) the full pipeline including alignment.
Figure 2
Figure 2
Sentieon DNASeq scalability as a function of sequencing coverage depth (A) by tool and (B) across the entire pipeline. Sample: NA24694, WGS, 25X-100X. Datapoints reflect averages over two replicates. Error bars are included in (B) but are too small to be visible.
Figure 3
Figure 3
CPU utilization, memory usage and I/O of the Sentieon DNASeq tools, excluding BWA MEM. The pipeline steps are labeled in the middle panel, following the –algo options used in the script. CPU utilization in the top panel corresponds to the sum total across the 40 cores on the node. RAM utilization in the middle panel was measured as resident set size (VmRSS) and total RAM reserved for computation (VmSize). I/O rates in the bottom panel were measured in reads and writes per second. Sample: NA12878, WGS, 20X.

References

    1. Banerjee S. S., Athreya A. P., Mainzer L. S., Jongeneel C. V., Hwu W.-M., Kalbarczyk Z. T., et al. (2016). Efficient and scalable workflows for genomic analyses. In Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing (ACM; ), 27–36. 10.1145/2912152.2912156 - DOI
    1. Broad Institute (2018). GATK | Archived versions. https://software.broadinstitute.org/gatk/download/archive.
    1. Chapman B. (2014). Benchmarking variation and rna-seq analyses on amazon web services with docker. Blue Collar Bioinformatics.
    1. Church G. M. (2005). The personal genome project. Mol. Syst. Biol. 1. 10.1038/msb4100040 - DOI - PMC - PubMed
    1. DePristo M. A., Banks E., Poplin R., Garimella K. V., Maguire J. R., Hartl C., et al. (2011). A framework for variation discovery and genotyping using next-generation dna sequencing data. Nat. Genet. 43, 491–498. 10.1038/ng.806 - DOI - PMC - PubMed