Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Apr 29:2011:bar011.
doi: 10.1093/database/bar011. Print 2011.

Integrating diverse databases into an unified analysis framework: a Galaxy approach

Affiliations

Integrating diverse databases into an unified analysis framework: a Galaxy approach

Daniel Blankenberg et al. Database (Oxford). .

Abstract

Recent technological advances have lead to the ability to generate large amounts of data for model and non-model organisms. Whereas, in the past, there have been a relatively small number of central repositories that serve genomic data, an increasing number of distinct specialized data repositories and resources have been established. Here, we describe a generic approach that provides for the integration of a diverse spectrum of data resources into a unified analysis framework, Galaxy (http://usegalaxy.org). This approach allows the simplified coupling of external data resources with the data analysis tools available to Galaxy users, while leveraging the native data mining facilities of the external data resources. DATABASE URL: http://usegalaxy.org.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The UCSC Table Browser tool. The UCSC Table Browser tool is shown with its native interface as it appears integrated into Galaxy (A). A simplified XML configuration file (B) that describes to Galaxy how to communicate with the data resource is shown. Advanced configuration options have been used to customize data set attributes and to enhance the user experience. Values for the file format and genome build are taken from the parameters provided by the datasource and made accessible to Galaxy. Additionally, this configuration causes the ‘Send output to Galaxy’ option to be automatically selected when a user begins from within Galaxy. The addition of a single line, outlined in blue, to the tool_conf.xml file is all that is required to inform Galaxy to load the tool (C).
Figure 2.
Figure 2.
UCSC Table Browser as a synchronous data resource example. An overview of a typical synchronous data resource tool, with the UCSC Table Browser as an example, is shown here. Based upon the XML configuration file for the UCSC Table Browser tool (Figure 1), Galaxy creates a new tool as a link (outlined in red) that references the data resource under the Get Data tool section (A). An example of the link (B) that is generated is described along with the parameters of which it is composed; several of the parameters provided in the tool XML configuration customize the initial interface of the external resource. By accessing the link, the user is forwarded within their web-browser to the native UCSC Table Browser interface (C). Once the user is satisfied with their query configuration and has selected the desired formatting options (D), the UCSC Table Browser generates a form (E; for brevity, some parameters have been removed from the original HTML) with an action that points to the Galaxy server. When Galaxy receives the post (F), a new data set is created in the user's history. Galaxy collects the parameters provided within the request and executes a process in the background that resubmits these parameters back to the Table Browser at the location specified by the provided URL parameter; the response from the Table Browser is the content that Galaxy will use to populate the new data set.
Figure 3.
Figure 3.
A simple NCBI sequence retrieval tool. This minimal tool interface (A: Galaxy tool description and B: Galaxy generated user interface) consists of a single textbox that allows the user to manually enter an accession number and a select list that allows the user to specify the target sequence database to search. When a user executes this tool, a simple script (C) is run by Galaxy which fetches the FASTA sequence data (D) for the user provided accession number. Color-matched boxes have been added to indicate the interrelatedness of various elements of the panels.
Figure 4.
Figure 4.
A Galaxy library containing pilot data from the 1000 Genomes project. This data was loaded directly into a Galaxy data library from the 1000 Genomes project FTP server. When a user imports a data set from a library, the underlying file on disk is not copied. Although each copy of a particular imported data set shares a reference to the same file on disk, the user is free to modify the metadata and attributes of their copy as they see fit.

Similar articles

Cited by

References

    1. Hawkins RD, Hon GC, Ren B. Next-generation genomics: an integrative approach. Nat. Rev. Genet. 2010;11:476–86. - PMC - PubMed
    1. Lyne R, Smith R, Rutherford K, et al. FlyMine: an integrated database for Drosophila and Anopheles genomics. Genome Biol. 2007;8:R129. - PMC - PubMed
    1. Haider S, Ballester B, Smedley D, et al. BioMart Central Portal–unified access to biological data. Nucleic Acids Res. 2009;37:W23–W27. - PMC - PubMed
    1. Karolchik D, Hinrichs AS, Furey TS, et al. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 2004;32:D493–D496. - PMC - PubMed
    1. Goecks,J., Nekrutenko,A., Taylor,J. and The Galaxy Team. (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol., 11, R86. - PMC - PubMed

Publication types