Comparative Study

. 2003 Aug;13(8):1904-15.

doi: 10.1101/gr.1363103. Epub 2003 Jul 17.

Biopipe: a flexible framework for protocol-based bioinformatics analysis

Shawn Hoon¹, Kiran Kumar Ratnapu, Jer-Ming Chia, Balamurugan Kumarasamy, Xiao Juguang, Michele Clamp, Arne Stabenau, Simon Potter, Laura Clarke, Elia Stupka

Affiliations

PMID: 12869579
PMCID: PMC403782
DOI: 10.1101/gr.1363103

Comparative Study

Biopipe: a flexible framework for protocol-based bioinformatics analysis

Shawn Hoon et al. Genome Res. 2003 Aug.

. 2003 Aug;13(8):1904-15.

doi: 10.1101/gr.1363103. Epub 2003 Jul 17.

Authors

Shawn Hoon¹, Kiran Kumar Ratnapu, Jer-Ming Chia, Balamurugan Kumarasamy, Xiao Juguang, Michele Clamp, Arne Stabenau, Simon Potter, Laura Clarke, Elia Stupka

Affiliation

¹ Institute of Molecular and Cell Biology, National University of Singapore, Singapore 117609.

PMID: 12869579
PMCID: PMC403782
DOI: 10.1101/gr.1363103

Abstract

We identify several challenges facing bioinformatics analysis today. Firstly, to fulfill the promise of comparative studies, bioinformatics analysis will need to accommodate different sources of data residing in a federation of databases that, in turn, come in different formats and modes of accessibility. Secondly, the tsunami of data to be handled will require robust systems that enable bioinformatics analysis to be carried out in a parallel fashion. Thirdly, the ever-evolving state of bioinformatics presents new algorithms and paradigms in conducting analysis. This means that any bioinformatics framework must be flexible and generic enough to accommodate such changes. In addition, we identify the need for introducing an explicit protocol-based approach to bioinformatics analysis that will lend rigorousness to the analysis. This makes it easier for experimentation and replication of results by external parties. Biopipe is designed in an effort to meet these goals. It aims to allow researchers to focus on protocol design. At the same time, it is designed to work over a compute farm and thus provides high-throughput performance. A common exchange format that encapsulates the entire protocol in terms of the analysis modules, parameters, and data versions has been developed to provide a powerful way in which to distribute and reproduce results. This will enable researchers to discuss and interpret the data better as the once implicit assumptions are now explicitly defined within the Biopipe framework.

PubMed Disclaimer

Figures

**Figure 1**
(A–C) An example of the steps involved in the design of a bioinformatics protocol for phylogenetic analysis. (D) Excerpt of the phylogenetic tree-building pipeline XML.

**Figure 2**
(1) The I/O component of Biopipe. The design of I/OHandlers allows different input and output sources to be plugged in for analysis. (2) The modular breakdown of the Analysis component. Inputs fetched via the I/OHandlers are passed to the Analysis component as in-memory objects. Wrapper and parser modules reside outside of Biopipe, and the modular design allows for different wrappers and parsers to be swapped into the runnable, which acts as an interface to the Biopipe system. The modular design allows for swapping of different wrappers and parsers into the runnable.

**Figure 3**
Transformers are applied to inputs after they are fetched from input I/OHandlers. They are also applied before being passed to the output I/OHandler. Transformers are modular in nature and may be chained.

**Figure 4**
(A) A single job unit, made up of three components. The input I/Ohandler tells the job how and where to fetch the input for the analysis. The analysis component tells the job what to run on the input. The output I/Ohandler tells the job how and where to store the results of the analysis. (B) (1) Rules are fetched via the Rule Adaptor when PipelineManager starts up. (2) A Job Unit/object is fetched via the JobAdaptor. Only Jobs with status *NEW* or *FAILED* are considered for job submission. (3) Jobs are submitted via the Batch Submission object. *FAILED* jobs are checked against a retry limit before being submitted. (4) The mechanism for job submission: A runner.pl script is passed along with the job ID to the underlying load-sharing software that dispatches the execution of the script to one of the compute nodes. (5) At the node, the job is recreated by fetching the job unit via the JobAdaptor using the job ID. (6) The job updates its running status during execution. (7) The PipelineManager creates new jobs for subsequent analysis according to rules and conditionals.

**Figure 5**
A simple example of a Biopipe session. (A) Loading the pipeline using the XML template. (B) Running the pipeline using PipelineManager. (C) Checking the Job Status via the Job Viewer script.

See this image and copyright information in PMC

References

1. Adams, M.D., Celniker, S.E., Holt, R.A., Evans, C.A., Gocayne, J.D., Amanatides, P.G., Scherer, S.E., Li, P.W., Hoskins, R.A., Galle, R.F., et al.2000. The genome sequence of Drosophila melanogaster. Science 287: 2185-2195. - PubMed
1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403-410. - PubMed
1. Burge, C. and Karlin, S.1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268: 78-94. - PubMed
1. Chicurel, M.2002. Bioinformatics: Bringing it all together. Nature 419: 751-755. - PubMed
1. Davidson, S.B., Overton, C., and Buneman, P. 1995. Challenges in integrating biological data sources. J. Comput. Biol. 2: 557-572. - PubMed

WEB SITE REFERENCES

1. ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalX/; CLUSTALW.
1. http://blast.wustl.edu; BLAST.
1. http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-pipeline/?cvs...; live CVS of the Biopipe source code.
1. http://evolution.genetics.washington.edu/phylip.html; PHYLIP Package.
1. http://genome.ucsc.edu; genome database.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Biopipe: a flexible framework for protocol-based bioinformatics analysis

Affiliation

Biopipe: a flexible framework for protocol-based bioinformatics analysis

Authors

Affiliation

Abstract

Figures

References

WEB SITE REFERENCES

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources