Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 May 4:13:77.
doi: 10.1186/1471-2105-13-77.

Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support

Affiliations

Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support

Mohamed Abouelhoda et al. BMC Bioinformatics. .

Abstract

Background: Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts.

Results: In this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure.

Conclusions: Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis.The system can be accessed either through a cloud-enabled web-interface or downloaded and installed to run within the user's local environment. All resources related to Tavaxy are available at http://www.tavaxy.org.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Use diagram of integrating Taverna, Galaxy, and Tavaxy workflows. Tavaxy is a standalone workflow system that executes Tavaxy workflows as well as integrates and executes Taverna and Galaxy workflows. Galaxy workflows are compatible with Tavaxy and can be imported and executed directly on the system. For Taverna workflows, the integration can take place at either run-time or design-time. At run time, the Taverna (sub-) workflows can be executed as a whole by calling the Taverna engine. They can also be saved as sub-workflows and used within other Tavaxy workflows. At workflow design time, Taverna workflows are translated to the Tavaxy language, enabling them to be edited and enhanced. In this case, the user has the option of replacing any of the remote calls in the Taverna workflow with calls to equivalent local tools. Any remaining Taverna sub-workflow fragments can be directly executed using the Taverna engine. As an optimization, sub-workflows can be encapsulated into maximal external sub-workflows so as to minimize execution overheads. The implementation section addresses the maximal external sub-workflows in more details.
Figure 2
Figure 2
Workflow patterns of Tavaxy. Workflow patterns modeling the execution of workflow tasks. The parts (a), (b), (c), (d), and (e) represent the sequence (pipeline) pattern, the synchronous merge, the synchronous fork, multi-choice fork, and iteration control patterns, respectively. The part (f) shows how a list of data items is processed, and (g) shows dot/cross product operation. The parts (h) and (j) represent the data select and data merge patterns, respectively.
Figure 3
Figure 3
Tavaxy architecture and interface. Left: Tavaxy Architecture. The authoring module (workflow editor) is where users compose, open, and import workflows into Tavaxy. The imported workflows can be in tSCUFL, SCUFL, t2flow, JSON formats. The mapping module produces tSCUFL files to be executed by the engine. The engine invokes either local tools or remote services. Upper right: The main interface of the Tavaxy system containing links to the authoring module, user’s workflow, user’s data, workflow repository, and other utilities and cloud tools. Lower right: The workflow authoring module, where the switch pattern is depicted. The cloud symbol and the parameter port appear on the tool node. On the righthand panel, the user can choose if a tool runs locally or on the cloud.
Figure 4
Figure 4
Use of cloud computing in Tavaxy. Left: The web interface for setting the computer cluster on the cloud. Right: The architecture of Tavaxy showing the local and cloud versions of the system. The data flows from the local version to either the mounted disk attached to the main machine or to the persistent S3 storage. The S3 storage serves two purposes: 1) persistent storage and 2) shared storage for the computer cluster.
Figure 5
Figure 5
Protein analysis workflow. Workflow for finding and analyzing homologous protein sequences. The highlighted parts are extra sub-workflows from Galaxy and Tavaxy, and the remaining parts correspond to a Taverna workflow already deposited at myExperimentweb-site.
Figure 6
Figure 6
Taverna implementation of the protein analysis workflow. Taverna implementation of the workflow in Figure 5. All program parameters (e.g., BLAST tool to be used and UPGMA NJ option) are considered as input to the workflow. High resolution versions of the figures of this paper are available in Additional File 2.
Figure 7
Figure 7
Imported Tavernaworkflow in Tavaxy. The imported Tavernaworkflow in Figure 6. The Tavaxy switch pattern is explicitly represented. The switch patterns are represented by diamond shapes. The upper switch pattern checks if the input sequence is DNA. If false, the lower switch pattern checks if it is a protein one. The dashed polygons mark two maximal external sub-workflows which will be encapsulated in the optimization step, as in Figure 8.
Figure 8
Figure 8
Hybrid and optimized workflow in Tavaxy. The workflow in Figure 7 after optimization and augmentation with extra components. Sub-workflows 1 and 2 are the maximal external sub-workflows marked in Figure 7 by dashed polygons. The extra Galaxyworkflow and Tavaxy nodes are also shown.
Figure 9
Figure 9
Metagenomics workflow. Left: Metagenomics workflow as originally provided by Galaxy. Right: the re-designed version of this workflow using the list pattern of Tavaxy.
Figure 10
Figure 10
The enhanced metagenomics workflow. The enhanced metagenomics workflow as implemented in Tavaxy.

Similar articles

Cited by

References

    1. Koboldt D, Ding L, Mardis E, Wilson R. Challenges of sequencing human genomes. Briefings in Bioinformics. 2010;11(5):484–498. doi: 10.1093/bib/bbq016. - DOI - PMC - PubMed
    1. Voelkerding K, Dames S, Durtschi J. Next-generation sequencing: from basic research to diagnostics. Clin Chem. 2009;55(4):641–658. doi: 10.1373/clinchem.2008.112789. - DOI - PubMed
    1. Sana M, Iascone M, Marchetti D, Palatini J, Galasso M, Volinia S. GAMES identifies and annotates mutations in next-generation sequencing projects. Bioinformics. 2010;27:9–13. - PubMed
    1. Wooley J, Godzik A, Friedberg I. A primer on metagenomics. PLoS Comput Biol. 2010;146(2):e1000667. - PMC - PubMed
    1. Chistoserdova L. Recent progress and new challenges in metagenomics for biotechnology. Biotechnological Letters. 2010;32:1351–1359. doi: 10.1007/s10529-010-0306-9. - DOI - PubMed

LinkOut - more resources