. 2017 Aug 1;6(8):1-7.

doi: 10.1093/gigascience/gix048.

Bio-Docklets: virtualization containers for single-step execution of NGS pipelines

Baekdoo Kim¹, Thahmina Ali¹, Carlos Lijeron¹, Enis Afgan², Konstantinos Krampis^{1

3

4}

Affiliations

¹ Center for Translational and Basic Research and Belfer Research Building, Hunter College of The City University of New York, 413 E 69th St, New York, NY 10021.
² Johns Hopkins University, Department of Biology, B3400 N Charles St, Mudd Hall 144, Baltimore MD 21218.
³ Department of Biological Sciences, Hunter College of The City University of New York, 695 Park Av., New York, NY, 10065.
⁴ Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weill Cornell Medical College, 413 E 69th St, New York, NY 10021.

PMID: 28854616
PMCID: PMC5569920
DOI: 10.1093/gigascience/gix048

Bio-Docklets: virtualization containers for single-step execution of NGS pipelines

Baekdoo Kim et al. Gigascience. 2017.

. 2017 Aug 1;6(8):1-7.

doi: 10.1093/gigascience/gix048.

Authors

Baekdoo Kim¹, Thahmina Ali¹, Carlos Lijeron¹, Enis Afgan², Konstantinos Krampis^{1

3

4}

Affiliations

¹ Center for Translational and Basic Research and Belfer Research Building, Hunter College of The City University of New York, 413 E 69th St, New York, NY 10021.
² Johns Hopkins University, Department of Biology, B3400 N Charles St, Mudd Hall 144, Baltimore MD 21218.
³ Department of Biological Sciences, Hunter College of The City University of New York, 695 Park Av., New York, NY, 10065.
⁴ Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weill Cornell Medical College, 413 E 69th St, New York, NY 10021.

PMID: 28854616
PMCID: PMC5569920
DOI: 10.1093/gigascience/gix048

Abstract

Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abstracting the complex data operations of multistep, bioinformatics pipelines for NGS data analysis. As examples, we have deployed 2 pipelines for RNA sequencing and chromatin immunoprecipitation sequencing, preconfigured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simply as running a single bioinformatics tool. This is achieved using a "meta-script" that automatically starts the Bio-Docklets and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface. The pipeline output is postprocessed by integration with the Visual Omics Explorer framework, providing interactive data visualizations that users can access through a web browser. Our goal is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts on any computing environment, whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets.

Keywords: CHIPseq; NGS; RNAseq; bioinformatics; docker.

PubMed Disclaimer

Figures

**Figure 1:**
The Bio-Docklets environment with an **(a)** interactive meta-script that enables users to start the pipelines **(b)**, select analysis parameters **(c)**, and set input **(d)** and output **(e)** directories. Shell scripts and Python code were used for connecting to the Galaxy API, retrieving required data such as reference genomes, initializing environment variables in the containers, starting and monitoring the pipeline execution **(f)**. Postprocessing and loading of the pipeline output on Visual Omics Explorer interactive visualizations are saved as output in HTML/Javascript files, which can be opened on a web browser at any time after pipeline completion and container shutdown; using the visualization, the output can be mined for clusters of differentially expressed genes or histone interaction peaks, and users can export the graphics in vectorized SVG format for use in manuscripts.

**Figure 2:**
**(a)** Galaxy workflow canvas running inside the Bio-Docklets, with the composed RNAseq and CHIPseq pipelines, respectively **(b)**. User interface of the “meta-script” interactively guides the users to select which pipeline to run, input and output file directories, and reference genome indices **(c, d)**. Postprocessed pipeline output, loaded on interactive HTML/Javascript-D3 visualizations using the Visual Omics Explorer framework, can be opened in a web browser and also exported as high-resolution, manuscript-ready graphics.

See this image and copyright information in PMC

Cited by

Democratizing bioinformatics through easily accessible software platforms for non-experts in the field.
Krampis K. Krampis K. Biotechniques. 2022 Feb;72(2):36-38. doi: 10.2144/btn-2021-0060. Epub 2022 Jan 21. Biotechniques. 2022. PMID: 35060754 Free PMC article. No abstract available.
YAMP: a containerized workflow enabling reproducibility in metagenomics research.
Visconti A, Martin TC, Falchi M. Visconti A, et al. Gigascience. 2018 Jul 1;7(7):giy072. doi: 10.1093/gigascience/giy072. Gigascience. 2018. PMID: 29917068 Free PMC article.
miCloud: A Plug-n-Play, Extensible, On-Premises Bioinformatics Cloud for Seamless Execution of Complex Next-Generation Sequencing Data Analysis Pipelines.
Kim B, Ali T, Dong C, Lijeron C, Mazumder R, Wultsch C, Krampis K. Kim B, et al. J Comput Biol. 2019 Mar;26(3):280-284. doi: 10.1089/cmb.2018.0218. Epub 2019 Jan 17. J Comput Biol. 2019. PMID: 30653336 Free PMC article.
Towards reproducible computational drug discovery.
Schaduangrat N, Lampa S, Simeon S, Gleeson MP, Spjuth O, Nantasenamat C. Schaduangrat N, et al. J Cheminform. 2020 Jan 28;12(1):9. doi: 10.1186/s13321-020-0408-x. J Cheminform. 2020. PMID: 33430992 Free PMC article. Review.
Harmonizing and integrating the NCI Genomic Data Commons through accessible, interactive, and cloud-enabled workflows.
Hung LH, Fukuda B, Schmitz R, Hoang V, Lloyd W, Yeung KY. Hung LH, et al. PLoS One. 2025 Mar 4;20(3):e0318676. doi: 10.1371/journal.pone.0318676. eCollection 2025. PLoS One. 2025. PMID: 40036210 Free PMC article.

See all "Cited by" articles

References

1. Krampis K, Booth T, Chapman B et al. . Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community. BMC Bioinformatics 2012;13(1):1–8. - PMC - PubMed
1. Hosny A, Vera-Licona P, Laubenbacher R et al. . AlgoRun: a Docker-based packaging system for platform-agnostic implemented algorithms. Bioinformatics 2016;32(15):2396–8. - PMC - PubMed
1. Di Tommaso P, Palumbo E, Chatzou M et al. . The impact of Docker containers on the performance of genomic pipelines. Peer J 2015;3:e1273. - PMC - PubMed
1. Moreews F, Sallou O, Ménager H et al. . BioShaDock: a community driven bioinformatics shared Docker-based tools registry. F1000Research 2015;4:1–9. - PMC - PubMed
1. Belmann P, Dröge J, Bremges A et al. . Bioboxes: standardised containers for interchangeable bioinformatics software. Gigascience 2015;4(1):47. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Bio-Docklets: virtualization containers for single-step execution of NGS pipelines

Affiliations

Bio-Docklets: virtualization containers for single-step execution of NGS pipelines

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Miscellaneous