Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jul 22:14:494.
doi: 10.1186/1471-2164-14-494.

Cloud-based uniform ChIP-Seq processing tools for modENCODE and ENCODE

Affiliations

Cloud-based uniform ChIP-Seq processing tools for modENCODE and ENCODE

Quang M Trinh et al. BMC Genomics. .

Abstract

Background: Funded by the National Institutes of Health (NIH), the aim of the Model Organism ENCyclopedia of DNA Elements (modENCODE) project is to provide the biological research community with a comprehensive encyclopedia of functional genomic elements for both model organisms C. elegans (worm) and D. melanogaster (fly). With a total size of just under 10 terabytes of data collected and released to the public, one of the challenges faced by researchers is to extract biologically meaningful knowledge from this large data set. While the basic quality control, pre-processing, and analysis of the data has already been performed by members of the modENCODE consortium, many researchers will wish to reinterpret the data set using modifications and enhancements of the original protocols, or combine modENCODE data with other data sets. Unfortunately this can be a time consuming and logistically challenging proposition.

Results: In recognition of this challenge, the modENCODE DCC has released uniform computing resources for analyzing modENCODE data on Galaxy (https://github.com/modENCODE-DCC/Galaxy), on the public Amazon Cloud (http://aws.amazon.com), and on the private Bionimbus Cloud for genomic research (http://www.bionimbus.org). In particular, we have released Galaxy workflows for interpreting ChIP-seq data which use the same quality control (QC) and peak calling standards adopted by the modENCODE and ENCODE communities. For convenience of use, we have created Amazon and Bionimbus Cloud machine images containing Galaxy along with all the modENCODE data, software and other dependencies.

Conclusions: Using these resources provides a framework for running consistent and reproducible analyses on modENCODE data, ultimately allowing researchers to use more of their time using modENCODE data, and less time moving it around.

PubMed Disclaimer

Figures

Figure 1
Figure 1
modENCODE DCC data flow from submission to release.
Figure 2
Figure 2
(a) modENCODE tools can be installed via Galaxy administrator interface by clicking on ‘Admin’ and ‘Search and browse tool sheds’ (indicated by red boxes). (b) modENCODE Galaxy after installations of modENCODE tools and their dependencies.
Figure 3
Figure 3
Data can be imported directly into Galaxy from our faceted browser.
Figure 4
Figure 4
Running the 2-replicate uniform processing/peak calling workflow.
Figure 5
Figure 5
Galaxy visualization of peak call output for chromosome II from the workflow for the two sample replicates.

References

    1. Celniker SE, Dillon LA, Gerstein MB, Gunsalus KC, Henikoff S, Karpen GH, Kellis M, Lai EC, Lieb JD, MacAlpine DM, Micklem G, Piano F, Snyder M, Stein L, White KP, Waterston RH. modENCODE Consortium. Unlocking the secrets of the genome. Nature. 2009;459(7249):927–930. doi: 10.1038/459927a. - DOI - PMC - PubMed
    1. Washington NL, Stinson EO, Perry MD, Ruzanov P, Contrino S, Smith R, Zha Z, Lyne R, Carr A, Lloyd P, Kephart E, McKay SJ, Micklem G, Stein LD, Lewis SE. The modENCODE Data Coordination Center: lessons in harvesting comprehensive experimental detail. Database. 2011;2011:bar023. doi: 10.1093/database/bar023. - DOI - PMC - PubMed
    1. Goecks J, Nekrutenko A, Taylor J. Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010;11(8):R86. doi: 10.1186/gb-2010-11-8-r86. - DOI - PMC - PubMed
    1. modENCODE Galaxy GitHub. https://github.com/modENCODE-DCC/Galaxy.
    1. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–1760. doi: 10.1093/bioinformatics/btp324. - DOI - PMC - PubMed

Publication types

LinkOut - more resources