DockerBIO: web application for efficient use of bioinformatics Docker images

doi:10.7717/peerj.5954

. 2018 Nov 27:6:e5954.

doi: 10.7717/peerj.5954. eCollection 2018.

DockerBIO: web application for efficient use of bioinformatics Docker images

ChangHyuk Kwon^{1

2}, Jason Kim², Jaegyoon Ahn¹

Affiliations

¹ Department of Computer Science and Engineering, Incheon National University, Incheon, The Republic of Korea.
² MyGenomeBox, Co, Incheon, The Republic of Korea.

PMID: 30515360
PMCID: PMC6266945
DOI: 10.7717/peerj.5954

DockerBIO: web application for efficient use of bioinformatics Docker images

ChangHyuk Kwon et al. PeerJ. 2018.

. 2018 Nov 27:6:e5954.

doi: 10.7717/peerj.5954. eCollection 2018.

Authors

ChangHyuk Kwon^{1

2}, Jason Kim², Jaegyoon Ahn¹

Affiliations

¹ Department of Computer Science and Engineering, Incheon National University, Incheon, The Republic of Korea.
² MyGenomeBox, Co, Incheon, The Republic of Korea.

PMID: 30515360
PMCID: PMC6266945
DOI: 10.7717/peerj.5954

Abstract

Background and objective: Docker is a light containerization program that shows almost the same performance as a local environment. Recently, many bioinformatics tools have been distributed as Docker images that include complex settings such as libraries, configurations, and data if needed, as well as the actual tools. Users can simply download and run them without making the effort to compile and configure them, and can obtain reproducible results. In spite of these advantages, several problems remain. First, there is a lack of clear standards for distribution of Docker images, and the Docker Hub often provides multiple images with the same objective but different uses. For these reasons, it can be difficult for users to learn how to select and use them. Second, Docker images are often not suitable as a component of a pipeline, because many of them include big data. Moreover, a group of users can have difficulties when sharing a pipeline composed of Docker images. Users of a group may modify scripts or use different versions of the data, which causes inconsistent results.

Methods and results: To handle the problems described above, we developed a Java web application, DockerBIO, which provides reliable, verified, light-weight Docker images for various bioinformatics tools and for various kinds of reference data. With DockerBIO, users can easily build a pipeline with tools and data registered at DockerBIO, and if necessary, users can easily register new tools or data. Built pipelines are registered in DockerBIO, which provides an efficient running environment for the pipelines registered at DockerBIO. This enables user groups to run their pipelines without expending much effort to copy and modify them.

Keywords: Bioinformatics; DNA pipeline; DNA-Seq; Docker; Dockerbio; Mygenomebox; NGS pipeline; RNA pipeline; RNA-Seq.

PubMed Disclaimer

Conflict of interest statement

ChangHyuk Kwon and Jason Kim are employed by MyGenomeBox, Co.

Figures

**Figure 1. Overview of the workflow.**
DockerBIO is composed of RegisterDocker and RunDocker. In RegisterDocker, users can use Docker images registered in DockerBIO, or search Docker images from Docker hub. They can also use data registered in DockerBIO, or search data from other data repositories. After the options are set and tested, a pipeline is made and registered to DockerBIO in RunDocker. In RunDocker, users can upload their own data, change options, run the registered pipeline and check results.

**Figure 2. (A) Docker LIST, (B) Docker info register and (C) SIMULATE in *RegisterDocker*.**
(A) Docker LIST: menus for editing and testing options. (B) Docker Info Register: Menus for searching Docker images from Docker Hub, registering dataset and setting options. (C) SIMULATE: menus for testing registered Docker and options.

**Figure 3. Options and menus on the *RunDocker* Page.**
UPLOAD USER FILE: for uploading user data files for analysis., DOCKER RUN: menus for running registered pipeline. Please refer to the UserManual for a detailed description of each command., JOB REQUEST LIST: menu for checking the result.

See this image and copyright information in PMC

Cited by

Democratizing bioinformatics through easily accessible software platforms for non-experts in the field.
Krampis K. Krampis K. Biotechniques. 2022 Feb;72(2):36-38. doi: 10.2144/btn-2021-0060. Epub 2022 Jan 21. Biotechniques. 2022. PMID: 35060754 Free PMC article. No abstract available.

References

1. Andrews S. Babraham bioinformatics—FastQC a quality control tool for high throughput sequence data. 2015. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
1. Beaulieu-Jones BK, Greene CS. Reproducibility of computational workflows is automated using continuous analysis. Nature Biotechnology. 2017;35(4):342–346. doi: 10.1038/nbt.3780. - DOI - PMC - PubMed
1. Cingolani P, Patel VM, Coon M, Nguyen T, Land SJ, Ruden DM, Lu X. Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift. Frontiers in Genetics. 2012;3:35. doi: 10.3389/fgene.2012.00035. - DOI - PMC - PubMed
1. Da Veiga Leprevost F, Grüning BA, Alves Aflitos S, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Vera Alvarez R, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017;33(16):2580–2582. doi: 10.1093/bioinformatics/btx192. - DOI - PMC - PubMed
1. Di Tommaso P, Palumbo E, Chatzou M, Prieto P, Heuer ML, Notredame C. The impact of Docker containers on the performance of genomic pipelines. PeerJ. 2015;3:e1273. doi: 10.7717/peerj.1273. - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources

[1] Andrews S. Babraham bioinformatics—FastQC a quality control tool for high throughput sequence data. 2015. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

[2] Andrews S. Babraham bioinformatics—FastQC a quality control tool for high throughput sequence data. 2015. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

[3] Beaulieu-Jones BK, Greene CS. Reproducibility of computational workflows is automated using continuous analysis. Nature Biotechnology. 2017;35(4):342–346. doi: 10.1038/nbt.3780. - DOI - PMC - PubMed

[4] Beaulieu-Jones BK, Greene CS. Reproducibility of computational workflows is automated using continuous analysis. Nature Biotechnology. 2017;35(4):342–346. doi: 10.1038/nbt.3780. - DOI - PMC - PubMed

[5] Cingolani P, Patel VM, Coon M, Nguyen T, Land SJ, Ruden DM, Lu X. Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift. Frontiers in Genetics. 2012;3:35. doi: 10.3389/fgene.2012.00035. - DOI - PMC - PubMed

[6] Cingolani P, Patel VM, Coon M, Nguyen T, Land SJ, Ruden DM, Lu X. Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift. Frontiers in Genetics. 2012;3:35. doi: 10.3389/fgene.2012.00035. - DOI - PMC - PubMed

[7] Da Veiga Leprevost F, Grüning BA, Alves Aflitos S, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Vera Alvarez R, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017;33(16):2580–2582. doi: 10.1093/bioinformatics/btx192. - DOI - PMC - PubMed

[8] Da Veiga Leprevost F, Grüning BA, Alves Aflitos S, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Vera Alvarez R, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017;33(16):2580–2582. doi: 10.1093/bioinformatics/btx192. - DOI - PMC - PubMed

[9] Di Tommaso P, Palumbo E, Chatzou M, Prieto P, Heuer ML, Notredame C. The impact of Docker containers on the performance of genomic pipelines. PeerJ. 2015;3:e1273. doi: 10.7717/peerj.1273. - DOI - PMC - PubMed

[10] Di Tommaso P, Palumbo E, Chatzou M, Prieto P, Heuer ML, Notredame C. The impact of Docker containers on the performance of genomic pipelines. PeerJ. 2015;3:e1273. doi: 10.7717/peerj.1273. - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

DockerBIO: web application for efficient use of bioinformatics Docker images

Affiliations

DockerBIO: web application for efficient use of bioinformatics Docker images

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources