Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jan 18:6:52.
doi: 10.12688/f1000research.10137.1. eCollection 2017.

The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows

Affiliations

The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows

Brian D O'Connor et al. F1000Res. .

Abstract

As genomic datasets continue to grow, the feasibility of downloading data to a local organization and running analysis on a traditional compute environment is becoming increasingly problematic. Current large-scale projects, such as the ICGC PanCancer Analysis of Whole Genomes (PCAWG), the Data Platform for the U.S. Precision Medicine Initiative, and the NIH Big Data to Knowledge Center for Translational Genomics, are using cloud-based infrastructure to both host and perform analysis across large data sets. In PCAWG, over 5,800 whole human genomes were aligned and variant called across 14 cloud and HPC environments; the processed data was then made available on the cloud for further analysis and sharing. If run locally, an operation at this scale would have monopolized a typical academic data centre for many months, and would have presented major challenges for data storage and distribution. However, this scale is increasingly typical for genomics projects and necessitates a rethink of how analytical tools are packaged and moved to the data. For PCAWG, we embraced the use of highly portable Docker images for encapsulating and sharing complex alignment and variant calling workflows across highly variable environments. While successful, this endeavor revealed a limitation in Docker containers, namely the lack of a standardized way to describe and execute the tools encapsulated inside the container. As a result, we created the Dockstore ( https://dockstore.org), a project that brings together Docker images with standardized, machine-readable ways of describing and running the tools contained within. This service greatly improves the sharing and reuse of genomics tools and promotes interoperability with similar projects through emerging web service standards developed by the Global Alliance for Genomics and Health (GA4GH).

Keywords: Docker; big data; bioinformatics; cloud; containers; genomics.

PubMed Disclaimer

Conflict of interest statement

Competing interests: No competing interests were disclosed.

Figures

Figure 1.
Figure 1.. Use cases for Dockstore.
Developers can use Dockstore to register Docker images built by, or uploaded to, Quay.io and DockerHub with CWL/WDL machine- and human-readable descriptors from GitHub or Bitbucket. Users can then query and find tools of interest, parameterize them, and run them at a small scale locally or at large scale on commercial or open source execution engines supporting Docker and CWL/WDL. Execution takes place on cloud or HPC environments supported by the execution engine of choice.
Figure 2.
Figure 2.. Docker images and tool descriptors or workflows in WDL/CWL are registered with Dockstore.
For tools, users can either use the fully automated approach ( A) where Docker images are built using Quay.io and original source Descriptors and Dockerfile are on BitBucket or GitHub. Alternatively, they can register pre-build Docker images ( C) that have been manually pushed to Quay.io or DockerHub. The former approach results in greater tool transparency and build reproducibility. Workflows in CWL/WDL do not require an image build process and can be directly registered from source control on BitBucket or GitHub ( B).
Figure 3.
Figure 3.. The GA4GH Tool Registry API standard showing the available endpoints.
These let systems find all tools in a given repository and get details on a particular tool, including versions, descriptors, and the original Dockerfile if available.
Figure 4.
Figure 4.. The web interface for the https://dockstore.org site.
( A) The main page lists the most recent additions to Dockstore and allows for users to search and login. ( B) A developer can easily publish their tools in Dockstore after logging in and linking to accounts. ( C) Users can see details about each tool, discuss the tool, share with social media, and navigate back to source.

References

    1. Stein LD, Knoppers BM, Campbell P, et al. : Data analysis: create a cloud commons. Nature. 2015;523(7559):149–151. 10.1038/523149a - DOI - PubMed
    1. Dirk M: Docker: lightweight linux containers for consistent development and deployment. Linux Journal. 2014;239:2 Reference Source
    1. Mark L, Siu LL, Rehm HL, et al. : All the World's a Stage: Facilitating Discovery Science and Improved Cancer Care through the Global Alliance for Genomics and Health. Cancer Discov. 2015;5(11):1133–1136. 10.1158/2159-8290.CD-15-0821 - DOI - PubMed
    1. Barry L: Oauth web authorization protocol. IEEE Internet Computing. 2012;16(1):74– 77 10.1109/MIC.2012.11 - DOI
    1. Thomas FR: Architectural styles and the design of network-based software architectures.University of California, Irvine.2000. Reference Source