Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jun:49:119-33.
doi: 10.1016/j.jbi.2014.01.005. Epub 2014 Jan 22.

Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses

Affiliations

Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses

Bo Liu et al. J Biomed Inform. 2014 Jun.

Abstract

Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach.

Keywords: Bioinformatics; Cloud computing; Galaxy; Scientific workflow; Sequencing analyses.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
RNA-Sequencing analysis workflow.
Fig. 2
Fig. 2
Globus Transfer tools in Galaxy.
Fig. 3
Fig. 3
CRData tools in Galaxy (this figure shows the interface of “sequenceDifferentialExperssion.R”).
Fig. 4
Fig. 4
CummeRbund tool in Galaxy.
Fig. 5
Fig. 5
Density plot generated by CummeRbund.
Fig. 6
Fig. 6
Connection of Cuffdiff and CummeRbund.
Fig. 7
Fig. 7
The main steps for using Globus Provision (the blocks with solid lines are necessary steps, while the ones with dashed lines are optional steps).
Fig. 8
Fig. 8
Topology file “galaxy.conf”.
Fig. 9
Fig. 9
Architecture of Cloud-based bioinformatics workflow platform.
Fig. 10
Fig. 10
Security mechanism.
Fig. 11
Fig. 11
CRData workflow.
Fig. 12
Fig. 12
CRData tool “affyDifferentialExpression.R” (Step 3).
Fig. 13
Fig. 13
Text output of “affyDifferentialExpression.R” (Step 3).
Fig. 14
Fig. 14
Figure output of “affyDifferentialExpression.R” in Step 3 (a) and Step 4 (b).
Fig. 15
Fig. 15
The RNA-Seq analysis workflow in Galaxy.
Fig. 16
Fig. 16
The execution time and cost of RNA-Seq workflow with different EC2 instance types.

References

    1. Driscoll AO, Daugelaite J, Sleator RD. ‘Big Data’, Hadoop and Cloud computing in genomics. J Biomed Inform. 2013;46:774–81. - PubMed
    1. Krampis K, Booth T, Chapman B, Tiwari B, Bicak M, Field D, et al. Cloud Biolinux: pre-configured and on-demand bioinformatics computing for the genomics community. BMC Bioinformatics. 2012;13:42. - PMC - PubMed
    1. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5:R80. - PMC - PubMed
    1. Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, et al. The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 2002;12:1611–8. - PMC - PubMed
    1. Rice P, Longden I, Bleasby A. Emboss: the European molecular biology open software suite. Trends Genet. 2000;16:276–7. - PubMed

Publication types

MeSH terms