Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun 1;8(6):giz037.
doi: 10.1093/gigascience/giz037.

GenPipes: an open-source framework for distributed and scalable genomic analyses

Affiliations

GenPipes: an open-source framework for distributed and scalable genomic analyses

Mathieu Bourgey et al. Gigascience. .

Abstract

Background: With the decreasing cost of sequencing and the rapid developments in genomics technologies and protocols, the need for validated bioinformatics software that enables efficient large-scale data processing is growing.

Findings: Here we present GenPipes, a flexible Python-based framework that facilitates the development and deployment of multi-step workflows optimized for high-performance computing clusters and the cloud. GenPipes already implements 12 validated and scalable pipelines for various genomics applications, including RNA sequencing, chromatin immunoprecipitation sequencing, DNA sequencing, methylation sequencing, Hi-C, capture Hi-C, metagenomics, and Pacific Biosciences long-read assembly. The software is available under a GPLv3 open source license and is continuously updated to follow recent advances in genomics and bioinformatics. The framework has already been configured on several servers, and a Docker image is also available to facilitate additional installations.

Conclusions: GenPipes offers genomics researchers a simple method to analyze different types of data, customizable to their needs and resources, as well as the flexibility to create their own workflows.

Keywords: bioinformatics; frameworks; genomics; pipeline; workflow; workflow management systems.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
General workflow of GenPipes. Diagram showing how the information flows from the user command line input through the 4 different objects (Pipeline, Step, Job, and Scheduler) in order to generate system-specific executable outputs. cmds: commands.
Figure 2:
Figure 2:
GenPipes’ properties and growth. A, Diagram showing GenPipes’ features, compatible computing platforms, and available pipelines. B, GenPipes’ available pipelines and maintained servers since the release of GenPipes in 2014. C, Bar plot showing the number of GenPipes runs per year since its release. RRBS: reduced-representation bisulfite sequencing; WGS: whole-genome seqencing.
Figure 3:
Figure 3:
GenPipes DNASeq pipeline diagram. Schematic representation of GenPipes’ dnaseq.py pipeline. Hexagons represent steps in the pipeline. White hexagons represent steps that process input from a single sample, while grey ones represent steps that process input from several samples. Arrows show step dependencies.

References

    1. ENCODE. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004;306(5696):636–40. - PubMed
    1. Stunnenberg HG, Hirst M. The International Human Epigenome Consortium: a blueprint for scientific collaboration and discovery. Cell. 2016;167(5):1145–9. - PubMed
    1. Mardis ER. The $1,000 genome, the $100 000 analysis?. Genome Med. 2010;2(11):84. - PMC - PubMed
    1. Afgan E, Baker D, van den Beek M et al. .. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res. 2016;44(W1):W3–W10. - PMC - PubMed
    1. DNANexus website. https://www.dnanexus.com/. Accesed September 2018.

Publication types

Grants and funding