Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2003 Aug;13(8):1904-15.
doi: 10.1101/gr.1363103. Epub 2003 Jul 17.

Biopipe: a flexible framework for protocol-based bioinformatics analysis

Affiliations
Comparative Study

Biopipe: a flexible framework for protocol-based bioinformatics analysis

Shawn Hoon et al. Genome Res. 2003 Aug.

Abstract

We identify several challenges facing bioinformatics analysis today. Firstly, to fulfill the promise of comparative studies, bioinformatics analysis will need to accommodate different sources of data residing in a federation of databases that, in turn, come in different formats and modes of accessibility. Secondly, the tsunami of data to be handled will require robust systems that enable bioinformatics analysis to be carried out in a parallel fashion. Thirdly, the ever-evolving state of bioinformatics presents new algorithms and paradigms in conducting analysis. This means that any bioinformatics framework must be flexible and generic enough to accommodate such changes. In addition, we identify the need for introducing an explicit protocol-based approach to bioinformatics analysis that will lend rigorousness to the analysis. This makes it easier for experimentation and replication of results by external parties. Biopipe is designed in an effort to meet these goals. It aims to allow researchers to focus on protocol design. At the same time, it is designed to work over a compute farm and thus provides high-throughput performance. A common exchange format that encapsulates the entire protocol in terms of the analysis modules, parameters, and data versions has been developed to provide a powerful way in which to distribute and reproduce results. This will enable researchers to discuss and interpret the data better as the once implicit assumptions are now explicitly defined within the Biopipe framework.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(AC) An example of the steps involved in the design of a bioinformatics protocol for phylogenetic analysis. (D) Excerpt of the phylogenetic tree-building pipeline XML.
Figure 1
Figure 1
(AC) An example of the steps involved in the design of a bioinformatics protocol for phylogenetic analysis. (D) Excerpt of the phylogenetic tree-building pipeline XML.
Figure 2
Figure 2
(1) The I/O component of Biopipe. The design of I/OHandlers allows different input and output sources to be plugged in for analysis. (2) The modular breakdown of the Analysis component. Inputs fetched via the I/OHandlers are passed to the Analysis component as in-memory objects. Wrapper and parser modules reside outside of Biopipe, and the modular design allows for different wrappers and parsers to be swapped into the runnable, which acts as an interface to the Biopipe system. The modular design allows for swapping of different wrappers and parsers into the runnable.
Figure 3
Figure 3
Transformers are applied to inputs after they are fetched from input I/OHandlers. They are also applied before being passed to the output I/OHandler. Transformers are modular in nature and may be chained.
Figure 4
Figure 4
(A) A single job unit, made up of three components. The input I/Ohandler tells the job how and where to fetch the input for the analysis. The analysis component tells the job what to run on the input. The output I/Ohandler tells the job how and where to store the results of the analysis. (B) (1) Rules are fetched via the Rule Adaptor when PipelineManager starts up. (2) A Job Unit/object is fetched via the JobAdaptor. Only Jobs with status NEW or FAILED are considered for job submission. (3) Jobs are submitted via the Batch Submission object. FAILED jobs are checked against a retry limit before being submitted. (4) The mechanism for job submission: A runner.pl script is passed along with the job ID to the underlying load-sharing software that dispatches the execution of the script to one of the compute nodes. (5) At the node, the job is recreated by fetching the job unit via the JobAdaptor using the job ID. (6) The job updates its running status during execution. (7) The PipelineManager creates new jobs for subsequent analysis according to rules and conditionals.
Figure 4
Figure 4
(A) A single job unit, made up of three components. The input I/Ohandler tells the job how and where to fetch the input for the analysis. The analysis component tells the job what to run on the input. The output I/Ohandler tells the job how and where to store the results of the analysis. (B) (1) Rules are fetched via the Rule Adaptor when PipelineManager starts up. (2) A Job Unit/object is fetched via the JobAdaptor. Only Jobs with status NEW or FAILED are considered for job submission. (3) Jobs are submitted via the Batch Submission object. FAILED jobs are checked against a retry limit before being submitted. (4) The mechanism for job submission: A runner.pl script is passed along with the job ID to the underlying load-sharing software that dispatches the execution of the script to one of the compute nodes. (5) At the node, the job is recreated by fetching the job unit via the JobAdaptor using the job ID. (6) The job updates its running status during execution. (7) The PipelineManager creates new jobs for subsequent analysis according to rules and conditionals.
Figure 5
Figure 5
A simple example of a Biopipe session. (A) Loading the pipeline using the XML template. (B) Running the pipeline using PipelineManager. (C) Checking the Job Status via the Job Viewer script.
Figure 5
Figure 5
A simple example of a Biopipe session. (A) Loading the pipeline using the XML template. (B) Running the pipeline using PipelineManager. (C) Checking the Job Status via the Job Viewer script.

References

    1. Adams, M.D., Celniker, S.E., Holt, R.A., Evans, C.A., Gocayne, J.D., Amanatides, P.G., Scherer, S.E., Li, P.W., Hoskins, R.A., Galle, R.F., et al.2000. The genome sequence of Drosophila melanogaster. Science 287: 2185-2195. - PubMed
    1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403-410. - PubMed
    1. Burge, C. and Karlin, S.1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268: 78-94. - PubMed
    1. Chicurel, M.2002. Bioinformatics: Bringing it all together. Nature 419: 751-755. - PubMed
    1. Davidson, S.B., Overton, C., and Buneman, P. 1995. Challenges in integrating biological data sources. J. Comput. Biol. 2: 557-572. - PubMed

Publication types

LinkOut - more resources