Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jul 10;7(1):68.
doi: 10.1186/s13073-015-0191-x. eCollection 2015.

Cpipe: a shared variant detection pipeline designed for diagnostic settings

Affiliations

Cpipe: a shared variant detection pipeline designed for diagnostic settings

Simon P Sadedin et al. Genome Med. .

Abstract

The benefits of implementing high throughput sequencing in the clinic are quickly becoming apparent. However, few freely available bioinformatics pipelines have been built from the ground up with clinical genomics in mind. Here we present Cpipe, a pipeline designed specifically for clinical genetic disease diagnostics. Cpipe was developed by the Melbourne Genomics Health Alliance, an Australian initiative to promote common approaches to genomics across healthcare institutions. As such, Cpipe has been designed to provide fast, effective and reproducible analysis, while also being highly flexible and customisable to meet the individual needs of diverse clinical settings. Cpipe is being shared with the clinical sequencing community as an open source project and is available at http://cpipeline.org.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Batch directory structure used by Cpipe. Each analysis is conducted using a standardised directory structure that separates raw data, design files and generated results from each other. All computed results of the analysis are confined to the ‘analysis’ directory, while source data is kept quarantined in the ‘data’ directory. The analysis directory keeps separate directories for each stage of the analysis starting with initial quality control (fastqc), alignment (align), variant calling (variants) and final quality control (qc). The final analysis results are placed in the ‘results’ directory
Fig. 2
Fig. 2
Simplified Cpipe analysis steps. Cpipe consists of a number of steps. The core of these are based on the best practice guidelines published by the Broad Institute, consisting of alignment using BWA mem, duplicate removal using Picard MarkDuplicates, local realignment and base quality score recalibration using GATK, and variant calling using GATK HaplotypeCaller. To support clinical requirements, many steps are added including quality control steps (BEDTools coverage and QC summary), additional annotation (Annovar and the Variant Effect Predictor, VEP) and enhanced reports (Annotated variants, Provenance PDF, QC Excel report and Gap Analysis)
Fig. 3
Fig. 3
Variant and Gene Priority Indexes. Curation of variants is aided by a prioritisation system that ranks variants according to (a) characteristics of the variant including frequency in population databases, conservation scores and the predicted impact on protein product, and (b) the strength of association of the gene to the phenotype under consideration
Fig. 4
Fig. 4
Overview of Cpipe workflow Cpipe accepts a flexible arrangement of exome or targeted capture samples. Each sample is assigned an Analysis Profile that determines the particular settings and gene list to analyse for that sample. Provenance and QC reports are produced as Excel and PDF files, while variant calls are delivered as both an Excel spreadsheet and a CSV file that is importable to LOVD3. In addition to allele frequencies from population databases, allele frequencies are also annotated from an internal embedded database that automatically tracks local population variants and sequencing artefacts

References

    1. Rehm HL. Disease-targeted sequencing: a cornerstone in the clinic. Nat Rev Genet. 2013;14:295–300. doi: 10.1038/nrg3463. - DOI - PMC - PubMed
    1. Fisch KM, Meissner T, Gioia L, Ducom J-C, Carland TM, Loguercio S, et al. Omics Pipe: a community-based framework for reproducible multi-omics data analysis. Bioinformatics. 2015;1–5. - PMC - PubMed
    1. bcbio-nextgen - Validated, scalable, community developed variant calling and RNA-seq analysis. Available at: https://github.com/chapmanb/bcbio-nextgen (accessed 31 March 2015).
    1. Li J, Doyle M a, Saeed I, Wong SQ, Mar V, Goode DL, et al. Bioinformatics pipelines for targeted resequencing and whole-exome sequencing of human and mouse genomes: a virtual appliance approach for instant deployment. PLoS One. 2014;9 doi: 10.1371/journal.pone.0095217. - DOI - PMC - PubMed
    1. Buske FA, French HJ, Smith MA, Clark SJ, Bauer DC. NGSANE: A lightweight production informatics framework for high-throughput data analysis. Bioinformatics. 2014;30:1471–2. doi: 10.1093/bioinformatics/btu036. - DOI - PMC - PubMed

LinkOut - more resources