Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010:628:275-96.
doi: 10.1007/978-1-60327-367-1_15.

Web-based analysis of (Epi-) genome data using EpiGRAPH and Galaxy

Affiliations

Web-based analysis of (Epi-) genome data using EpiGRAPH and Galaxy

Christoph Bock et al. Methods Mol Biol. 2010.

Abstract

Modern life sciences are becoming increasingly data intensive, posing a significant challenge for most researchers and shifting the bottleneck of scientific discovery from data generation to data analysis. As a result, progress in genome research is increasingly impeded by bioinformatic hurdles. A new generation of powerful and easy-to-use genome analysis tools has been developed to address this issue, enabling biologists to perform complex bioinformatic analyses online - without having to learn a programming language or downloading and manually processing large datasets. In this tutorial paper, we describe the use of EpiGRAPH (http://epigraph.mpi-inf.mpg.de/) and Galaxy (http://galaxyproject.org/) for genome and epigenome analysis, and we illustrate how these two web services work together to identify epigenetic modifications that are characteristics of highly polymorphic (SNP-rich) promoters. This paper is supplemented with video tutorials (http://tinyurl.com/yc5xkqq), which provide a step-by-step guide through each example analysis.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Workflow for web-based analysis of epigenome datasets. This figure outlines a workflow for epigenome data analysis using publicly available tools and web services. After data preprocessing with software tools that address the specific properties of the experimental method used (box 1), the user uploads the newly generated dataset into a genome browser, in order to facilitate visualization and hypothesis generation by manual inspection (box 2). Next, he or she processes the data with a genome calculator such as Galaxy, in order to extract and prepare interesting regions for in-depth analysis (box 3). Finally, genome analysis tools such as EpiGRAPH can be used to test for significant associations with genome annotation data and to perform bioinformatic prediction (box 4), which might result in ideas for new experiments – driving the next iteration of the analytical circle
Fig. 2.
Fig. 2.
Submitting a custom dataset for analysis with EpiGRAPH. This screenshot displays EpiGRAPH’s attribute submission page, consisting of a brief attribute documentation (top), a set of text fields in which the column semantics are specified (e.g., which column contains the chromosome name and the start and end position for each genomic region) and a large text area into which a tab-separated table of genomic regions can be pasted. Due to different column widths, the columns of the table are not properly aligned, which is often the case and will not cause any problems. Importantly, each row in the table must correspond to exactly one genomic region, and its location in terms of chromosome name, start position and end position must be specified relative to the genome assembly selected in the choice box below the EpiGRAPH logo on the right of the screen (“hg18” in this case)
Fig. 3.
Fig. 3.
Configuring and starting an EpiGRAPH analysis. This screenshot displays EpiGRAPH’s analysis specification page. Here, the user can select which class attribute to use (if more than one class attribute was provided during the attribute submission steps), configure down-sampling, select prediction attributes, and enter a brief documentation of the analysis
Fig. 4.
Fig. 4.
Results of an EpiGRAPH analysis of DNA methylation at CpG islands. These screenshots display the results of an EpiGRAPH analysis comparing methylated CpG islands (class = 1) with unmethylated CpG islands (class = 0), based on a published dataset of DNA methylation on chromosome 21 (31). The results of the statistical analysis (Panel A) show that the “CG” sequence pattern is over-represented in unmethylated CpG islands, while the “CA” sequence pattern is over- represented in methylated CpG islands. Statistical testing was performed using the nonparametric Wilcoxon rank-sum test and P-values were adjusted for multiple testing using the highly conservative Bonferroni method (sig bonf) as well as the false discovery rate method (sig fdr). An explanation of the attribute names is available from http://epigraph.mpi-inf.mpg.de/WebGRAPH/faces/Background.html#attributes. The machine learning analysis (Panel B) confirms that these and other differences are sufficient to predict with relatively high accuracy whether or not a CpG island is methylated. The values in the bottom table correspond to the average performance of a linear support vector machine that was trained and evaluated in ten repetitions of a tenfold cross-validation, summarized by the mean correlation (mean corr), prediction accuracy (mean acc), sensitivity (sens), and specificity (spec). Additional columns display standard deviations observed among the repeated cross-validations with random partition assignment (corr sd and acc sd), the number of attribute variables in each attribute group (#vars), and the total number of genomic regions included in the analysis (#cases)
Fig. 5.
Fig. 5.
Identification of highly polymorphic promoters using Galaxy. The Galaxy web interface consists of four areas: the upper bar, tool frame (left column), detail frame (middle column), and history frame (right column). The upper bar contains user account controls as well as help and contact links. The tool frame on the left lists the analysis tools and data sources available to the user. The middle frame displays the interface of the currently selected tool. The history frame on the right shows loaded datasets and results of analyses performed by the user. Pictured here are six history items representing two original datasets (1: Human Genes and 2: SNP) and results of their manipulations. Every action by the user generates a new history item, which can then be used in subsequent analyses, downloaded, or visualized
Fig. 6.
Fig. 6.
Documentation of an analysis using Galaxy’s history function. All actions performed within Galaxy are documented in the history frame, which contains uploaded data as well as calculated results. Original datasets are always preserved, and every subsequent analysis adds a new entry into the history frame. This screenshot illustrates how a user starts with an empty history, adds a dataset containing coordinates of human genes and SNPs, converts coordinates of genes into coordinates of promoter regions by selecting the region located 500 base pairs upstream of each gene, computes the number of SNPs per promoter, sorts the promoters by SNP density, and finally selects 100 top regions. In addition to documenting analyses, Galaxy’s history frame allows the user to share a history with colleagues

Similar articles

Cited by

References

    1. Bernstein BE, Meissner A and Lander ES (2007) The mammalian epigenome. Cell, 128, 669–681. - PubMed
    1. Chen K and Rajewsky N (2007) The evolution of gene regulation by transcription factors and microRNAs. Nat. Rev. Genet, 8, 93–103. - PubMed
    1. Zhang MQ (2005) In: Pal SK (ed.), PReMI. Springer-Verlag; Berlin Heidelberg, Vol. 3776, pp. 31–38.
    1. Frigola J, Song J, Stirzaker C, Hinshelwood RA, Peinado MA and Clark SJ (2006) Epigenetic remodeling in colorectal cancer results in coordinate gene suppression across an entire chromosome band. Nat. Genet, 38, 540–549. - PubMed
    1. Feinberg AP (2007) Phenotypic plasticity and the epigenetics of human disease. Nature, 447, 433–440. - PubMed

Publication types