Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 9;14(7):e0211608.
doi: 10.1371/journal.pone.0211608. eCollection 2019.

Managing genomic variant calling workflows with Swift/T

Affiliations

Managing genomic variant calling workflows with Swift/T

Azza E Ahmed et al. PLoS One. .

Abstract

Bioinformatics research is frequently performed using complex workflows with multiple steps, fans, merges, and conditionals. This complexity makes management of the workflow difficult on a computer cluster, especially when running in parallel on large batches of data: hundreds or thousands of samples at a time. Scientific workflow management systems could help with that. Many are now being proposed, but is there yet the "best" workflow management system for bioinformatics? Such a system would need to satisfy numerous, sometimes conflicting requirements: from ease of use, to seamless deployment at peta- and exa-scale, and portability to the cloud. We evaluated Swift/T as a candidate for such role by implementing a primary genomic variant calling workflow in the Swift/T language, focusing on workflow management, performance and scalability issues that arise from production-grade big data genomic analyses. In the process we introduced novel features into the language, which are now part of its open repository. Additionally, we formalized a set of design criteria for quality, robust, maintainable workflows that must function at-scale in a production setting, such as a large genomic sequencing facility or a major hospital system. The use of Swift/T conveys two key advantages. (1) It operates transparently in multiple cluster scheduling environments (PBS Torque, SLURM, Cray aprun environment, etc.), thus a single workflow is trivially portable across numerous clusters. (2) The leaf functions of Swift/T permit developers to easily swap executables in and out of the workflow, which makes it easy to maintain and to request resources optimal for each stage of the pipeline. While Swift/T's data-level parallelism eliminates the need to code parallel analysis of multiple samples, it does make debugging more difficult, as is common for implicitly parallel code. Nonetheless, the language gives users a powerful and portable way to scale up analyses in many computing architectures. The code for our implementation of a variant calling workflow using Swift/T can be found on GitHub at https://github.com/ncsa/Swift-T-Variant-Calling, with full documentation provided at http://swift-t-variant-calling.readthedocs.io/en/latest/.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Swift/T variant calling code, under the hood.
Left: Patterns of parallelization implemented in our Swift/T variant calling workflow. Right: Colored blocks represent the different stages of the workflow. Black blocks indicate methods within the respective modules.
Fig 2
Fig 2. Timing provenance tracking of a 3-sample pipeline run (synthetic whole exome sequencing dataset at 30X, 50X and 70X) on Biocluster [50].
This plot view is interactive, allowing full pan and zoom and was generated using plotly library in R.

References

    1. Metzker ML. Sequencing technologies—the next generation. Nat Rev Genet. 2010;11(1):31–46. 10.1038/nrg2626 - DOI - PubMed
    1. Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet. 2016;17(6):333–351. 10.1038/nrg.2016.49 - DOI - PMC - PubMed
    1. Rabbani B, Tekin M, Mahdieh N. The promise of whole-exome sequencing in medical genetics. J Hum Genet. 2014;59(1):5–15. 10.1038/jhg.2013.114 - DOI - PubMed
    1. Allard MW. The Future of Whole-Genome Sequencing for Public Health and the Clinic. J Clin Microbiol. 2016;54(8):1946–1948. 10.1128/JCM.01082-16 - DOI - PMC - PubMed
    1. Bao R, Huang L, Andrade J, Tan W, Kibbe WA, Jiang H, et al. Review of current methods, applications, and data management for the bioinformatics analysis of whole exome sequencing. Cancer Inform. 2014;13(Suppl 2):67–82. 10.4137/CIN.S13779 - DOI - PMC - PubMed

Publication types