. 2015 Jun 1;31(11):1724-8.

doi: 10.1093/bioinformatics/btv061. Epub 2015 Jan 30.

Omics Pipe: a community-based framework for reproducible multi-omics data analysis

Kathleen M Fisch¹, Tobias Meißner¹, Louis Gioia¹, Jean-Christophe Ducom¹, Tristan M Carland¹, Salvatore Loguercio¹, Andrew I Su¹

Affiliations

Affiliation

¹ Department of Molecular and Experimental Medicine, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, USA, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, USA and Department of Human Biology, J. Craig Venter Institute, 4120 Capricorn Lane, La Jolla, CA 92037, USA.

PMID: 25637560
PMCID: PMC4443682
DOI: 10.1093/bioinformatics/btv061

Omics Pipe: a community-based framework for reproducible multi-omics data analysis

Kathleen M Fisch et al. Bioinformatics. 2015.

. 2015 Jun 1;31(11):1724-8.

doi: 10.1093/bioinformatics/btv061. Epub 2015 Jan 30.

Authors

Kathleen M Fisch¹, Tobias Meißner¹, Louis Gioia¹, Jean-Christophe Ducom¹, Tristan M Carland¹, Salvatore Loguercio¹, Andrew I Su¹

Affiliation

¹ Department of Molecular and Experimental Medicine, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, USA, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, USA and Department of Human Biology, J. Craig Venter Institute, 4120 Capricorn Lane, La Jolla, CA 92037, USA.

PMID: 25637560
PMCID: PMC4443682
DOI: 10.1093/bioinformatics/btv061

Abstract

Motivation: Omics Pipe (http://sulab.scripps.edu/omicspipe) is a computational framework that automates multi-omics data analysis pipelines on high performance compute clusters and in the cloud. It supports best practice published pipelines for RNA-seq, miRNA-seq, Exome-seq, Whole-Genome sequencing, ChIP-seq analyses and automatic processing of data from The Cancer Genome Atlas (TCGA). Omics Pipe provides researchers with a tool for reproducible, open source and extensible next generation sequencing analysis. The goal of Omics Pipe is to democratize next-generation sequencing analysis by dramatically increasing the accessibility and reproducibility of best practice computational pipelines, which will enable researchers to generate biologically meaningful and interpretable results.

Results: Using Omics Pipe, we analyzed 100 TCGA breast invasive carcinoma paired tumor-normal datasets based on the latest UCSC hg19 RefSeq annotation. Omics Pipe automatically downloaded and processed the desired TCGA samples on a high throughput compute cluster to produce a results report for each sample. We aggregated the individual sample results and compared them to the analysis in the original publications. This comparison revealed high overlap between the analyses, as well as novel findings due to the use of updated annotations and methods.

Availability and implementation: Source code for Omics Pipe is freely available on the web (https://bitbucket.org/sulab/omics_pipe). Omics Pipe is distributed as a standalone Python package for installation (https://pypi.python.org/pypi/omics_pipe) and as an Amazon Machine Image in Amazon Web Services Elastic Compute Cloud that contains all necessary third-party software dependencies and databases (https://pythonhosted.org/omics_pipe/AWS_installation.html).

PubMed Disclaimer

Figures

**Fig. 1.**
Schematic diagram of Omics Pipe demonstrating the parallel execution of pipelined tasks and samples. Omics Pipe requires a parameter file in YAML format, and can be run on a local compute cluster or in the cloud. Each run of Omics Pipe is logged with the version and run information for reproducibility

**Fig. 2.**
Pre-built best practice pipelines and the third party software tools supported by Omics Pipe. Users can easily create custom pipelines from the existing modules and they can create new modules supporting additional third party software tools

**Fig. 3.**
Comparison of the number of genes annotated in two different UCSC RefSeq releases and the number of DE genes identified by different algorithms and annotations. **(a)** Venn diagram of the number of genes annotated in the UCSC RefSeq hg19 2011 Generic Annotation File and the UCSC RefSeq hg19 2013 annotation (Release 57) **(b)** Venn diagram of the comparison of the number of DE genes identified between raw counts generated with the TCGA UNC V2 RNA-seq Workflow using the UCSC RefSeq hg19 2011 Generic Annotation File and raw counts generated with the count-based pipeline in Omics Pipe using the UCSC RefSeq hg19 2013 annotation (Release 57)

**Fig. 4.**
Consensus clustering analysis of the TCGA breast invasive carcinoma paired tumor-normal samples performed with the reanalyzed count data (**a–d**) and the original raw counts downloaded from TCGA (**e–h**) for cluster sizes of k = 2, k = 3, k = 4 and k = 10. The heat map displays sample consensus

**Fig. 5.**
Measurements of consensus for different cluster sizes (k) from the consensus clustering analysis on the reanalyzed (**a–c**) and original counts (**d–f**) from the TCGA paired tumor-normal breast invasive carcinoma samples. The empirical cumulative distribution (CDF) plots (a) and (d) indicate at which k the shape of the curve approaches the ideal step function. Plots (b) and (e) depict the area under the two CDF curves. Item consensus plots (c) and (f) demonstrate the mean consensus of each sample with all other samples in a particular cluster (represented by color)

See this image and copyright information in PMC

Cited by

GenPipes: an open-source framework for distributed and scalable genomic analyses.
Bourgey M, Dali R, Eveleigh R, Chen KC, Letourneau L, Fillon J, Michaud M, Caron M, Sandoval J, Lefebvre F, Leveque G, Mercier E, Bujold D, Marquis P, Van PT, Anderson de Lima Morais D, Tremblay J, Shao X, Henrion E, Gonzalez E, Quirion PO, Caron B, Bourque G. Bourgey M, et al. Gigascience. 2019 Jun 1;8(6):giz037. doi: 10.1093/gigascience/giz037. Gigascience. 2019. PMID: 31185495 Free PMC article.
The Metagenomics and Metadesign of the Subways and Urban Biomes (MetaSUB) International Consortium inaugural meeting report.
MetaSUB International Consortium. MetaSUB International Consortium. Microbiome. 2016 Jun 3;4(1):24. doi: 10.1186/s40168-016-0168-z. Microbiome. 2016. PMID: 27255532 Free PMC article.
miARma-Seq: a comprehensive tool for miRNA, mRNA and circRNA analysis.
Andrés-León E, Núñez-Torres R, Rojas AM. Andrés-León E, et al. Sci Rep. 2016 May 11;6:25749. doi: 10.1038/srep25749. Sci Rep. 2016. PMID: 27167008 Free PMC article.
Machine Learning and Integrative Analysis of Biomedical Big Data.
Mirza B, Wang W, Wang J, Choi H, Chung NC, Ping P. Mirza B, et al. Genes (Basel). 2019 Jan 28;10(2):87. doi: 10.3390/genes10020087. Genes (Basel). 2019. PMID: 30696086 Free PMC article. Review.
OncoRep: an n-of-1 reporting tool to support genome-guided treatment for breast cancer patients using RNA-sequencing.
Meißner T, Fisch KM, Gioia L, Su AI. Meißner T, et al. BMC Med Genomics. 2015 May 21;8:24. doi: 10.1186/s12920-015-0095-z. BMC Med Genomics. 2015. PMID: 25989980 Free PMC article.

See all "Cited by" articles

References

1. Anders S., et al. (2013). Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nat. Protoc. , 8, 1765–1786. - PubMed
1. Anders S., et al. (2015). HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics , 31, 166–169. - PMC - PubMed
1. Bywater M.J., et al. (2013). Dysregulation of the basal RNA polymerase transcription apparatus in cancer. Nat. Rev. Cancer , 13, 299–314. - PubMed
1. Cancer Genome Atlas Network. (2012). Comprehensive molecular portraits of human breast tumours. Nature , 490, 61–70. - PMC - PubMed
1. Davison A. (2012). Automated capture of experiment context for easier reproducibility in computational research. Comput. Sci. Eng. , 14, 48–56.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

CA92577/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Omics Pipe: a community-based framework for reproducible multi-omics data analysis

Affiliation

Omics Pipe: a community-based framework for reproducible multi-omics data analysis

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases