Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jul 26:12:304.
doi: 10.1186/1471-2105-12-304.

Applications of the pipeline environment for visual informatics and genomics computations

Affiliations

Applications of the pipeline environment for visual informatics and genomics computations

Ivo D Dinov et al. BMC Bioinformatics. .

Abstract

Background: Contemporary informatics and genomics research require efficient, flexible and robust management of large heterogeneous data, advanced computational tools, powerful visualization, reliable hardware infrastructure, interoperability of computational resources, and detailed data and analysis-protocol provenance. The Pipeline is a client-server distributed computational environment that facilitates the visual graphical construction, execution, monitoring, validation and dissemination of advanced data analysis protocols.

Results: This paper reports on the applications of the LONI Pipeline environment to address two informatics challenges - graphical management of diverse genomics tools, and the interoperability of informatics software. Specifically, this manuscript presents the concrete details of deploying general informatics suites and individual software tools to new hardware infrastructures, the design, validation and execution of new visual analysis protocols via the Pipeline graphical interface, and integration of diverse informatics tools via the Pipeline eXtensible Markup Language syntax. We demonstrate each of these processes using several established informatics packages (e.g., miBLAST, EMBOSS, mrFAST, GWASS, MAQ, SAMtools, Bowtie) for basic local sequence alignment and search, molecular biology data analysis, and genome-wide association studies. These examples demonstrate the power of the Pipeline graphical workflow environment to enable integration of bioinformatics resources which provide a well-defined syntax for dynamic specification of the input/output parameters and the run-time execution controls.

Conclusions: The LONI Pipeline environment http://pipeline.loni.ucla.edu provides a flexible graphical infrastructure for efficient biomedical computing and distributed informatics research. The interactive Pipeline resource manager enables the utilization and interoperability of diverse types of informatics resources. The Pipeline client-server model provides computational power to a broad spectrum of informatics investigators--experienced developers and novice users, user with or without access to advanced computational-resources (e.g., Grid, data), as well as basic and translational scientists. The open development, validation and dissemination of computational networks (pipeline workflows) facilitates the sharing of knowledge, tools, protocols and best practices, and enables the unbiased validation and replication of scientific findings by the entire community.

PubMed Disclaimer

Figures

Figure 1
Figure 1
An example of a completed Pipeline workflow (Local Shape Analysis) representing an end-to-end computational solution to a specific brain mapping problem. This pipeline protocol starts with the raw magnetic resonance imaging data for 2 cohorts (11 Alzheimer's disease patients and 10 age-matched normal controls). For each subject, the workflow automatically extracts a region of interest (left superior frontal gyrus, LSFG. using BrainParser [1]) and generates a 2D shape manifold model of the regional boundary [2,3]. Then the pipeline computes a mean LSFG shape using the normal subjects LSFG shapes, coregisters the LSFG shapes of all subjects to the mean (atlas) LSFG shape, and maps the locations of the statistically significant differences of the 3D displacement vector fields between the 2 cohorts. The insert images illustrate the mean LSFG shape (top-right), the LSFG for one subject (bottom-left), and the between-group statistical mapping results overlaid on the mean LSFG shape (bottom-right), red color indicates p-value < 0.01.
Figure 2
Figure 2
High-level schematic representation of the communication between multiple local Pipeline clients connected to multiple remote Pipeline servers.
Figure 3
Figure 3
A high-level group-folded representation of the alignment and assembly protocol, Table 2, as a Pipeline graphical workflow.
Figure 4
Figure 4
A snapshot of the input parameters (data-sinks) for the miBLAST Pipeline workflow.
Figure 5
Figure 5
A snapshot of the completed miBLAST Pipeline workflow. The insert image illustrates the final output result, see Table 3.
Figure 6
Figure 6
A snapshot of the input parameters for the EMBOSS Matcher Pipeline workflow.
Figure 7
Figure 7
A snapshot of the completed EMBOSS Matcher Pipeline workflow. The Insert image shows the output result of the local sequence alignment of hba_human and hbb_human.
Figure 8
Figure 8
A snapshot of the input parameters for the mrFAST Indexing Pipeline workflow.
Figure 9
Figure 9
A snapshot of the completed mrFAST Indexing Pipeline workflow.
Figure 10
Figure 10
A snapshot of the input parameters for the GWASS Impute Pipeline workflow.
Figure 11
Figure 11
A snapshot of the completed GWASS Impute Pipeline workflow.
Figure 12
Figure 12
Pipeline Server Library.
Figure 13
Figure 13
A snapshot of the input parameters for this heterogeneous Pipeline workflow: EMBOSS: tsw:hba_human, tsw:hbb_human mrFAST: cofold-blue.fasta, query.fasta, dna.fasta.
Figure 14
Figure 14
A snapshot of the completed heterogeneous (EMBOSS/mrFAST) Pipeline workflow.
Figure 15
Figure 15
A snapshot of the input parameters for this heterogeneous Pipeline workflow.
Figure 16
Figure 16
A snapshot of the completed heterogeneous Pipeline workflow. The image shows the expanded (raw, unfolded) version of the protocol which is analogous to the folded version of the same pipeline workflow illustrated on Figure 3. The folded version only demonstrates the major steps in the protocol and abstracts away some of the technical details, however, both versions of this protocol perform identical analyses.

Similar articles

Cited by

References

    1. Wild DJ. Mining large heterogeneous data sets in drug discovery. Expert Opinion on Drug Discovery. 2009;4(10):995–1004. doi: 10.1517/17460440903233738. - DOI - PubMed
    1. Toga AW, Thompson PM. What is where and why it is important. NeuroImage. 2007;37(4):1045–1049. doi: 10.1016/j.neuroimage.2007.02.018. - DOI - PMC - PubMed
    1. Pilemalm S, Timpka T. Third generation participatory design in health informatics--Making user participation applicable to large-scale information system projects. Journal of Biomedical Informatics. 2008;41(2):327–339. doi: 10.1016/j.jbi.2007.09.004. - DOI - PubMed
    1. Samatova NF, Breimyer P, Hendrix W, Schmidt MC, Rhyne TM. An outlook into ultra-scale visualization of large-scale biological data. Ultrascale Visualization, 2008 UltraVis 2008 Workshop on: 16-16 Nov. 2008. 2008. pp. 29–39.
    1. Zhang SW, Li YJ, Xia L, Pan Q. PPLook: an automated data mining tool for protein-protein interaction. BMC Bioinformatics. 2010;11(1):326. doi: 10.1186/1471-2105-11-326. - DOI - PMC - PubMed

Publication types