Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jun 1;31(11):1724-8.
doi: 10.1093/bioinformatics/btv061. Epub 2015 Jan 30.

Omics Pipe: a community-based framework for reproducible multi-omics data analysis

Affiliations

Omics Pipe: a community-based framework for reproducible multi-omics data analysis

Kathleen M Fisch et al. Bioinformatics. .

Abstract

Motivation: Omics Pipe (http://sulab.scripps.edu/omicspipe) is a computational framework that automates multi-omics data analysis pipelines on high performance compute clusters and in the cloud. It supports best practice published pipelines for RNA-seq, miRNA-seq, Exome-seq, Whole-Genome sequencing, ChIP-seq analyses and automatic processing of data from The Cancer Genome Atlas (TCGA). Omics Pipe provides researchers with a tool for reproducible, open source and extensible next generation sequencing analysis. The goal of Omics Pipe is to democratize next-generation sequencing analysis by dramatically increasing the accessibility and reproducibility of best practice computational pipelines, which will enable researchers to generate biologically meaningful and interpretable results.

Results: Using Omics Pipe, we analyzed 100 TCGA breast invasive carcinoma paired tumor-normal datasets based on the latest UCSC hg19 RefSeq annotation. Omics Pipe automatically downloaded and processed the desired TCGA samples on a high throughput compute cluster to produce a results report for each sample. We aggregated the individual sample results and compared them to the analysis in the original publications. This comparison revealed high overlap between the analyses, as well as novel findings due to the use of updated annotations and methods.

Availability and implementation: Source code for Omics Pipe is freely available on the web (https://bitbucket.org/sulab/omics_pipe). Omics Pipe is distributed as a standalone Python package for installation (https://pypi.python.org/pypi/omics_pipe) and as an Amazon Machine Image in Amazon Web Services Elastic Compute Cloud that contains all necessary third-party software dependencies and databases (https://pythonhosted.org/omics_pipe/AWS_installation.html).

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Schematic diagram of Omics Pipe demonstrating the parallel execution of pipelined tasks and samples. Omics Pipe requires a parameter file in YAML format, and can be run on a local compute cluster or in the cloud. Each run of Omics Pipe is logged with the version and run information for reproducibility
Fig. 2.
Fig. 2.
Pre-built best practice pipelines and the third party software tools supported by Omics Pipe. Users can easily create custom pipelines from the existing modules and they can create new modules supporting additional third party software tools
Fig. 3.
Fig. 3.
Comparison of the number of genes annotated in two different UCSC RefSeq releases and the number of DE genes identified by different algorithms and annotations. (a) Venn diagram of the number of genes annotated in the UCSC RefSeq hg19 2011 Generic Annotation File and the UCSC RefSeq hg19 2013 annotation (Release 57) (b) Venn diagram of the comparison of the number of DE genes identified between raw counts generated with the TCGA UNC V2 RNA-seq Workflow using the UCSC RefSeq hg19 2011 Generic Annotation File and raw counts generated with the count-based pipeline in Omics Pipe using the UCSC RefSeq hg19 2013 annotation (Release 57)
Fig. 4.
Fig. 4.
Consensus clustering analysis of the TCGA breast invasive carcinoma paired tumor-normal samples performed with the reanalyzed count data (a–d) and the original raw counts downloaded from TCGA (e–h) for cluster sizes of k = 2, k = 3, k = 4 and k = 10. The heat map displays sample consensus
Fig. 5.
Fig. 5.
Measurements of consensus for different cluster sizes (k) from the consensus clustering analysis on the reanalyzed (a–c) and original counts (d–f) from the TCGA paired tumor-normal breast invasive carcinoma samples. The empirical cumulative distribution (CDF) plots (a) and (d) indicate at which k the shape of the curve approaches the ideal step function. Plots (b) and (e) depict the area under the two CDF curves. Item consensus plots (c) and (f) demonstrate the mean consensus of each sample with all other samples in a particular cluster (represented by color)

Similar articles

Cited by

References

    1. Anders S., et al. (2013). Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nat. Protoc. , 8, 1765–1786. - PubMed
    1. Anders S., et al. (2015). HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics , 31, 166–169. - PMC - PubMed
    1. Bywater M.J., et al. (2013). Dysregulation of the basal RNA polymerase transcription apparatus in cancer. Nat. Rev. Cancer , 13, 299–314. - PubMed
    1. Cancer Genome Atlas Network. (2012). Comprehensive molecular portraits of human breast tumours. Nature , 490, 61–70. - PMC - PubMed
    1. Davison A. (2012). Automated capture of experiment context for easier reproducibility in computational research. Comput. Sci. Eng. , 14, 48–56.

Publication types