Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov 27:6:e5954.
doi: 10.7717/peerj.5954. eCollection 2018.

DockerBIO: web application for efficient use of bioinformatics Docker images

Affiliations

DockerBIO: web application for efficient use of bioinformatics Docker images

ChangHyuk Kwon et al. PeerJ. .

Abstract

Background and objective: Docker is a light containerization program that shows almost the same performance as a local environment. Recently, many bioinformatics tools have been distributed as Docker images that include complex settings such as libraries, configurations, and data if needed, as well as the actual tools. Users can simply download and run them without making the effort to compile and configure them, and can obtain reproducible results. In spite of these advantages, several problems remain. First, there is a lack of clear standards for distribution of Docker images, and the Docker Hub often provides multiple images with the same objective but different uses. For these reasons, it can be difficult for users to learn how to select and use them. Second, Docker images are often not suitable as a component of a pipeline, because many of them include big data. Moreover, a group of users can have difficulties when sharing a pipeline composed of Docker images. Users of a group may modify scripts or use different versions of the data, which causes inconsistent results.

Methods and results: To handle the problems described above, we developed a Java web application, DockerBIO, which provides reliable, verified, light-weight Docker images for various bioinformatics tools and for various kinds of reference data. With DockerBIO, users can easily build a pipeline with tools and data registered at DockerBIO, and if necessary, users can easily register new tools or data. Built pipelines are registered in DockerBIO, which provides an efficient running environment for the pipelines registered at DockerBIO. This enables user groups to run their pipelines without expending much effort to copy and modify them.

Keywords: Bioinformatics; DNA pipeline; DNA-Seq; Docker; Dockerbio; Mygenomebox; NGS pipeline; RNA pipeline; RNA-Seq.

PubMed Disclaimer

Conflict of interest statement

ChangHyuk Kwon and Jason Kim are employed by MyGenomeBox, Co.

Figures

Figure 1
Figure 1. Overview of the workflow.
DockerBIO is composed of RegisterDocker and RunDocker. In RegisterDocker, users can use Docker images registered in DockerBIO, or search Docker images from Docker hub. They can also use data registered in DockerBIO, or search data from other data repositories. After the options are set and tested, a pipeline is made and registered to DockerBIO in RunDocker. In RunDocker, users can upload their own data, change options, run the registered pipeline and check results.
Figure 2
Figure 2. (A) Docker LIST, (B) Docker info register and (C) SIMULATE in RegisterDocker.
(A) Docker LIST: menus for editing and testing options. (B) Docker Info Register: Menus for searching Docker images from Docker Hub, registering dataset and setting options. (C) SIMULATE: menus for testing registered Docker and options.
Figure 3
Figure 3. Options and menus on the RunDocker Page.
UPLOAD USER FILE: for uploading user data files for analysis., DOCKER RUN: menus for running registered pipeline. Please refer to the UserManual for a detailed description of each command., JOB REQUEST LIST: menu for checking the result.

Similar articles

Cited by

References

    1. Andrews S. Babraham bioinformatics—FastQC a quality control tool for high throughput sequence data. 2015. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
    1. Beaulieu-Jones BK, Greene CS. Reproducibility of computational workflows is automated using continuous analysis. Nature Biotechnology. 2017;35(4):342–346. doi: 10.1038/nbt.3780. - DOI - PMC - PubMed
    1. Cingolani P, Patel VM, Coon M, Nguyen T, Land SJ, Ruden DM, Lu X. Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift. Frontiers in Genetics. 2012;3:35. doi: 10.3389/fgene.2012.00035. - DOI - PMC - PubMed
    1. Da Veiga Leprevost F, Grüning BA, Alves Aflitos S, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Vera Alvarez R, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017;33(16):2580–2582. doi: 10.1093/bioinformatics/btx192. - DOI - PMC - PubMed
    1. Di Tommaso P, Palumbo E, Chatzou M, Prieto P, Heuer ML, Notredame C. The impact of Docker containers on the performance of genomic pipelines. PeerJ. 2015;3:e1273. doi: 10.7717/peerj.1273. - DOI - PMC - PubMed

LinkOut - more resources