OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow
- PMID: 34388963
- PMCID: PMC8361789
- DOI: 10.1186/s12859-021-04317-y
OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow
Abstract
Background: The advent of next generation sequencing has opened new avenues for basic and applied research. One application is the discovery of sequence variants causative of a phenotypic trait or a disease pathology. The computational task of detecting and annotating sequence differences of a target dataset between a reference genome is known as "variant calling". Typically, this task is computationally involved, often combining a complex chain of linked software tools. A major player in this field is the Genome Analysis Toolkit (GATK). The "GATK Best Practices" is a commonly referred recipe for variant calling. However, current computational recommendations on variant calling predominantly focus on human sequencing data and ignore ever-changing demands of high-throughput sequencing developments. Furthermore, frequent updates to such recommendations are counterintuitive to the goal of offering a standard workflow and hamper reproducibility over time.
Results: A workflow for automated detection of single nucleotide polymorphisms and insertion-deletions offers a wide range of applications in sequence annotation of model and non-model organisms. The introduced workflow builds on the GATK Best Practices, while enabling reproducibility over time and offering an open, generalized computational architecture. The workflow achieves parallelized data evaluation and maximizes performance of individual computational tasks. Optimized Java garbage collection and heap size settings for the GATK applications SortSam, MarkDuplicates, HaplotypeCaller, and GatherVcfs effectively cut the overall analysis time in half.
Conclusions: The demand for variant calling, efficient computational processing, and standardized workflows is growing. The Open source Variant calling workFlow (OVarFlow) offers automation and reproducibility for a computationally optimized variant calling task. By reducing usage of computational resources, the workflow removes prior existing entry barriers to the variant calling field and enables standardized variant calling.
Keywords: Benchmarking; Data parallelization; GATK; Java; Next generation sequencing; Reproducibility; SNP; Variant calling; indel.
© 2021. The Author(s).
Conflict of interest statement
The authors declare that they have no competing interests.
Figures




Similar articles
-
An analytical workflow for accurate variant discovery in highly divergent regions.BMC Genomics. 2016 Sep 2;17(1):703. doi: 10.1186/s12864-016-3045-z. BMC Genomics. 2016. PMID: 27590916 Free PMC article.
-
A high-performance computational workflow to accelerate GATK SNP detection across a 25-genome dataset.BMC Biol. 2024 Jan 25;22(1):13. doi: 10.1186/s12915-024-01820-5. BMC Biol. 2024. PMID: 38273258 Free PMC article.
-
An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data.Infect Genet Evol. 2020 Apr;79:104152. doi: 10.1016/j.meegid.2019.104152. Epub 2019 Dec 24. Infect Genet Evol. 2020. PMID: 31881359
-
Comprehensive fundamental somatic variant calling and quality management strategies for human cancer genomes.Brief Bioinform. 2021 May 20;22(3):bbaa083. doi: 10.1093/bib/bbaa083. Brief Bioinform. 2021. PMID: 32510555 Review.
-
Standardization and quality management in next-generation sequencing.Appl Transl Genom. 2016 Jul 1;10:2-9. doi: 10.1016/j.atg.2016.06.001. eCollection 2016 Sep. Appl Transl Genom. 2016. PMID: 27668169 Free PMC article. Review.
Cited by
-
RNA-Seq based selection signature analysis for identifying genomic footprints associated with the fat-tail phenotype in sheep.Front Vet Sci. 2024 Sep 30;11:1415027. doi: 10.3389/fvets.2024.1415027. eCollection 2024. Front Vet Sci. 2024. PMID: 39403211 Free PMC article.
-
Detection of Potential Mutated Genes Associated with Common Immunotherapy Biomarkers in Non-Small-Cell Lung Cancer Patients.Curr Oncol. 2022 Aug 15;29(8):5715-5730. doi: 10.3390/curroncol29080451. Curr Oncol. 2022. PMID: 36005189 Free PMC article.
-
Comprehensive Molecular and Genomic Analysis of NCI-MATCH Subprotocol Y: Capivasertib in Patients With an AKT1 E17K-Mutated Tumor.JCO Precis Oncol. 2025 Mar;9:e2400614. doi: 10.1200/PO-24-00614. Epub 2025 Mar 28. JCO Precis Oncol. 2025. PMID: 40153687
-
Genomic and Transcriptome Analysis Reveals the Biosynthesis Network of Cordycepin in Cordyceps militaris.Genes (Basel). 2024 May 15;15(5):626. doi: 10.3390/genes15050626. Genes (Basel). 2024. PMID: 38790255 Free PMC article.
-
Transcriptomic profiling of near-isogenic lines reveals candidate genes for a significant locus conferring metribuzin resistance in wheat.BMC Plant Biol. 2023 May 5;23(1):237. doi: 10.1186/s12870-023-04166-2. BMC Plant Biol. 2023. PMID: 37142987 Free PMC article.
References
MeSH terms
LinkOut - more resources
Full Text Sources