Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Mar 28:7:176.
doi: 10.1186/1471-2105-7-176.

The Gaggle: an open-source software system for integrating bioinformatics software and data sources

Affiliations

The Gaggle: an open-source software system for integrating bioinformatics software and data sources

Paul T Shannon et al. BMC Bioinformatics. .

Abstract

Background: Systems biologists work with many kinds of data, from many different sources, using a variety of software tools. Each of these tools typically excels at one type of analysis, such as of microarrays, of metabolic networks and of predicted protein structure. A crucial challenge is to combine the capabilities of these (and other forthcoming) data resources and tools to create a data exploration and analysis environment that does justice to the variety and complexity of systems biology data sets. A solution to this problem should recognize that data types, formats and software in this high throughput age of biology are constantly changing.

Results: In this paper we describe the Gaggle -a simple, open-source Java software environment that helps to solve the problem of software and database integration. Guided by the classic software engineering strategy of separation of concerns and a policy of semantic flexibility, it integrates existing popular programs and web resources into a user-friendly, easily-extended environment. We demonstrate that four simple data types (names, matrices, networks, and associative arrays) are sufficient to bring together diverse databases and software. We highlight some capabilities of the Gaggle with an exploration of Helicobacter pylori pathogenesis genes, in which we identify a putative ricin-like protein -a discovery made possible by simultaneous data exploration using a wide range of publicly available data and a variety of popular bioinformatics software tools.

Conclusion: We have integrated diverse databases (for example, KEGG, BioCyc, String) and software (Cytoscape, DataMatrixViewer, R statistical environment, and TIGR Microarray Expression Viewer). Through this loose coupling of diverse software and databases the Gaggle enables simultaneous exploration of experimental data (mRNA and protein abundance, protein-protein and protein-DNA interactions), functional associations (operon, chromosomal proximity, phylogenetic pattern), metabolic pathways (KEGG) and Pubmed abstracts (STRING web resource), creating an exploratory environment useful to 'web browser and spreadsheet biologists', to statistically savvy computational biologists, and those in between. The Gaggle uses Java RMI and Java Web Start technologies and can be found at http://gaggle.systemsbiology.net.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A simple introductory example for use of Gaggle. A set of genes (circular nodes with edges represents associations/interactions) selected in Cytoscape (A) are broadcasted to the Gaggle Boss (B). The Gaggle Boss re-routes the broadcast to a Java web browser connected to KEGG (C), further exploration wherein localizes H. pylori proteins to relevant subunits in the flagellar apparatus map. A second goose that receives the broadcast is the DMV (D). A plot function therein provides mRNA levels of the 15 H. pylori genes in 57 experimental conditions.
Figure 2
Figure 2
Workflow used in Gaggle for exploration of H. pylori pathogenesis (see text for details). The exploration begins with the Gaggle Boss (GB). All steps (mouse clicks) are indicated by arrows alongside numbers (both in black and red font) that correspond to sequence of actions. Black numbers indicate actions within a goose; red arrows and numbers (enclosed in red circles) indicate "Broadcast" actions with corresponding red numbers (not enclosed in circles) indicating transmission of data from one goose to another (implicitly through the GB). The three watermark arrows in (A) green, (B) red and (C) grey provide sequence and paths of exploratory routes.
Figure 3
Figure 3
Annotated prolinks network view of 263 genes identified to beputatively functionally associated with one or more of the 26 cytotoxin-associated cag genes in H. pylori. This filtered network was obtained through selection of genes in biclusters of putatively co-regulated containing one or more cag gene(s). The cag genes are indicated with pink node borders. See inset keys for description of node and edge coloring.

References

    1. Facciotti MT, Bonneau R, Hood L, Baliga NS. Systems Biology Experimental Design - Considerations for Building Predictive Gene Regulatory Network Models for Prokaryotic Systems. Current Genomics. 2004;5:527–544. doi: 10.2174/1389202043348850. - DOI
    1. Eckart JD, Sobral BW. A life scientist's gateway to distributed data management and computing: the PathPort/ToolBus framework. Omics. 2003;7:79–88. doi: 10.1089/153623103322006661. - DOI - PubMed
    1. Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, Carver T, Glover K, Pocock MR, Wipat A, Li P. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics. 2004;20:3045–3054. doi: 10.1093/bioinformatics/bth361. - DOI - PubMed
    1. Covitz PA, Hartel F, Schaefer C, De Coronado S, Fragoso G, Sahni H, Gustafson S, Buetow KH. caCORE: a common infrastructure for cancer informatics. Bioinformatics. 2003;19:2404–2412. doi: 10.1093/bioinformatics/btg335. - DOI - PubMed
    1. Gaggle [http://gaggle.systemsbiology.org/]

Publication types