Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Aug 1;6(8):1-7.
doi: 10.1093/gigascience/gix048.

Bio-Docklets: virtualization containers for single-step execution of NGS pipelines

Affiliations

Bio-Docklets: virtualization containers for single-step execution of NGS pipelines

Baekdoo Kim et al. Gigascience. .

Abstract

Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abstracting the complex data operations of multistep, bioinformatics pipelines for NGS data analysis. As examples, we have deployed 2 pipelines for RNA sequencing and chromatin immunoprecipitation sequencing, preconfigured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simply as running a single bioinformatics tool. This is achieved using a "meta-script" that automatically starts the Bio-Docklets and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface. The pipeline output is postprocessed by integration with the Visual Omics Explorer framework, providing interactive data visualizations that users can access through a web browser. Our goal is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts on any computing environment, whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets.

Keywords: CHIPseq; NGS; RNAseq; bioinformatics; docker.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
The Bio-Docklets environment with an (a) interactive meta-script that enables users to start the pipelines (b), select analysis parameters (c), and set input (d) and output (e) directories. Shell scripts and Python code were used for connecting to the Galaxy API, retrieving required data such as reference genomes, initializing environment variables in the containers, starting and monitoring the pipeline execution (f). Postprocessing and loading of the pipeline output on Visual Omics Explorer interactive visualizations are saved as output in HTML/Javascript files, which can be opened on a web browser at any time after pipeline completion and container shutdown; using the visualization, the output can be mined for clusters of differentially expressed genes or histone interaction peaks, and users can export the graphics in vectorized SVG format for use in manuscripts.
Figure 2:
Figure 2:
(a) Galaxy workflow canvas running inside the Bio-Docklets, with the composed RNAseq and CHIPseq pipelines, respectively (b). User interface of the “meta-script” interactively guides the users to select which pipeline to run, input and output file directories, and reference genome indices (c, d). Postprocessed pipeline output, loaded on interactive HTML/Javascript-D3 visualizations using the Visual Omics Explorer framework, can be opened in a web browser and also exported as high-resolution, manuscript-ready graphics.

Similar articles

Cited by

References

    1. Krampis K, Booth T, Chapman B et al. . Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community. BMC Bioinformatics 2012;13(1):1–8. - PMC - PubMed
    1. Hosny A, Vera-Licona P, Laubenbacher R et al. . AlgoRun: a Docker-based packaging system for platform-agnostic implemented algorithms. Bioinformatics 2016;32(15):2396–8. - PMC - PubMed
    1. Di Tommaso P, Palumbo E, Chatzou M et al. . The impact of Docker containers on the performance of genomic pipelines. Peer J 2015;3:e1273. - PMC - PubMed
    1. Moreews F, Sallou O, Ménager H et al. . BioShaDock: a community driven bioinformatics shared Docker-based tools registry. F1000Research 2015;4:1–9. - PMC - PubMed
    1. Belmann P, Dröge J, Bremges A et al. . Bioboxes: standardised containers for interchangeable bioinformatics software. Gigascience 2015;4(1):47. - PMC - PubMed

Publication types