Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Oct 26;10(10):e0140829.
doi: 10.1371/journal.pone.0140829. eCollection 2015.

Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud

Affiliations

Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud

Enis Afgan et al. PLoS One. .

Abstract

Background: Analyzing high throughput genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation. A computational platform enabling best practice genomics analysis ideally meets a number of requirements, including: a wide range of analysis and visualisation tools, closely linked to large user and reference data sets; workflow platform(s) enabling accessible, reproducible, portable analyses, through a flexible set of interfaces; highly available, scalable computational resources; and flexibility and versatility in the use of these resources to meet demands and expertise of a variety of users. Access to an appropriate computational platform can be a significant barrier to researchers, as establishing such a platform requires a large upfront investment in hardware, experience, and expertise.

Results: We designed and implemented the Genomics Virtual Laboratory (GVL) as a middleware layer of machine images, cloud management tools, and online services that enable researchers to build arbitrarily sized compute clusters on demand, pre-populated with fully configured bioinformatics tools, reference datasets and workflow and visualisation options. The platform is flexible in that users can conduct analyses through web-based (Galaxy, RStudio, IPython Notebook) or command-line interfaces, and add/remove compute nodes and data resources as required. Best-practice tutorials and protocols provide a path from introductory training to practice. The GVL is available on the OpenStack-based Australian Research Cloud (http://nectar.org.au) and the Amazon Web Services cloud. The principles, implementation and build process are designed to be cloud-agnostic.

Conclusions: This paper provides a blueprint for the design and implementation of a cloud-based Genomics Virtual Laboratory. We discuss scope, design considerations and technical and logistical constraints, and explore the value added to the research community through the suite of services and resources provided by our implementation.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The GVL launch process for starting self-launched instances of the GVL workbench.
(a) A user initiates the launch process via the launch service (launch.genome.edu.au) by providing their cloud credentials to the launcher application and (b) within a few minutes is able to access the management interface (CloudMan) on the deployed instance of the workbench. (c) After workbench services have started, the researcher can use the applications as desired (e.g., Galaxy).
Fig 2
Fig 2. A screenshot of the GVL Dashboard.
The GVL Dashboard is a portal running on every GVL instance. It lists all of the available services, their status, and offers a direct link to access those.
Fig 3
Fig 3. An evolution of the data analysis solutions for genomics.
Initially, standalone and purpose-specific tools were most prevalent. As the complexity of analyses grew, new platforms formed that aggregate many standalone tools and support different types of computational infrastructures to offer more versatile functionality. The GVL represents another step in this evolution where it aggregates a large number of the best-of-breed software and technologies available today.
Fig 4
Fig 4. Three basic architectural layers composing the GVL workbench.
The GVL leverages cloud resource and is compatible with multiple cloud technologies. Through a set of cloud resource management tools, the details of cloud resources are hidden enabling non-cloud aware applications to readily execute in this environment.
Fig 5
Fig 5. Architectural components of the GVL’s management layer.
Each GVL instance is, at runtime, composed of a number of components that the GVL provides: a virtual machine image, a volume snapshot or an archive of the tools file system, and a snapshot or a hosted instance of the indices file system. Combined at runtime by CloudMan into a virtual cluster, the components enable a flexible and feature-full bioinformatics workbench.

References

    1. Schatz MC, Langmead B. The DNA data deluge. IEEE Spectrum. 2013;50: 28–33. 10.1109/MSPEC.2013.6545119 - DOI - PMC - PubMed
    1. Berger B, Peng J, Singh M. Computational solutions for omics data. Nature reviews Genetics. 2013;14: 333–46. 10.1038/nrg3433 - DOI - PMC - PubMed
    1. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. The Human Genome Browser at UCSC. Genome Research. 2002. pp. 996–1006. 10.1101/gr.229102 - DOI - PMC - PubMed
    1. Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, et al. The generic genome browser: A building block for a model organism system database. Genome Research. 2002;12: 1599–1610. 10.1101/gr.403602 - DOI - PMC - PubMed
    1. Nicol JW, Helt GA, Blanchard SG, Raja A, Loraine AE. The Integrated Genome Browser: Free software for distribution and exploration of genome-scale datasets. Bioinformatics. 2009;25: 2730–2731. 10.1093/bioinformatics/btp472 - DOI - PMC - PubMed

Publication types

LinkOut - more resources