Cpipe: a shared variant detection pipeline designed for diagnostic settings

Simon P Sadedin¹, Harriet Dashnow², Paul A James³, Melanie Bahlo⁴, Denis C Bauer⁵, Andrew Lonie⁶, Sebastian Lunke⁷, Ivan Macciocca⁸, Jason P Ross⁹, Kirby R Siemering¹⁰, Zornitza Stark¹¹, Susan M White¹²; Melbourne Genomics Health Alliance; Graham Taylor¹³, Clara Gaff¹⁴, Alicia Oshlack¹, Natalie P Thorne¹⁵

Affiliations

¹ Murdoch Childrens Research Institute, Royal Children's Hospital, Flemington Road, Parkville, 3052 Australia.
² Murdoch Childrens Research Institute, Royal Children's Hospital, Flemington Road, Parkville, 3052 Australia ; Life Sciences Computation Centre, Victorian Life Sciences Computation Initiative, Faculty of Medicine Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC 3010 Australia.
³ Genetic Medicine, Royal Melbourne Hospital, Parkville, VIC 3052 Australia.
⁴ Population Health and Immunity Division, The Walter and Eliza Hall Institute, Royal Parade, Parkville, VIC 3052 Australia ; Department of Mathematics and Statistics, The University of Melbourne, Melbourne, VIC 3010 Australia ; Department of Medical Biology, The University of Melbourne, Melbourne, VIC 3010 Australia.
⁵ CSIRO, Digital Productivity Flagship, 11 Julius Av, 2113, Sydney, Australia.
⁶ Life Sciences Computation Centre, Victorian Life Sciences Computation Initiative, Faculty of Medicine Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC 3010 Australia.
⁷ Genomic Medicine, Centre for Translational Pathology, Department of Pathology, The University of Melbourne, Melbourne, VIC 3010 Australia.
⁸ Victorian Clinical Genetics Service, Murdoch Childrens Research Institute, Royal Children's Hospital, Flemington Road, Parkville, 3052 Australia ; Melbourne Genomics Health Alliance, Melbourne, Australia.
⁹ CSIRO Food and Nutrition Flagship, North Ryde, NSW 2113 Australia.
¹⁰ Australian Genome Research Facility, The Walter and Eliza Hall Institute, Royal Parade, Parkville, VIC 3050 Australia.
¹¹ Victorian Clinical Genetics Service, Murdoch Childrens Research Institute, Royal Children's Hospital, Flemington Road, Parkville, 3052 Australia.
¹² Victorian Clinical Genetics Service, Murdoch Childrens Research Institute, Royal Children's Hospital, Flemington Road, Parkville, 3052 Australia ; Department of Paediatrics, The University of Melbourne, Melbourne, VIC 3010 Australia.
¹³ Murdoch Childrens Research Institute, Royal Children's Hospital, Flemington Road, Parkville, 3052 Australia ; Genomic Medicine, Centre for Translational Pathology, Department of Pathology, The University of Melbourne, Melbourne, VIC 3010 Australia ; Victorian Clinical Genetics Service, Murdoch Childrens Research Institute, Royal Children's Hospital, Flemington Road, Parkville, 3052 Australia.
¹⁴ Murdoch Childrens Research Institute, Royal Children's Hospital, Flemington Road, Parkville, 3052 Australia ; Victorian Clinical Genetics Service, Murdoch Childrens Research Institute, Royal Children's Hospital, Flemington Road, Parkville, 3052 Australia ; Melbourne Genomics Health Alliance, Melbourne, Australia ; Department of Medicine, The University of Melbourne, Melbourne, VIC 3010 Australia.
¹⁵ Murdoch Childrens Research Institute, Royal Children's Hospital, Flemington Road, Parkville, 3052 Australia ; Department of Medical Biology, The University of Melbourne, Melbourne, VIC 3010 Australia ; Melbourne Genomics Health Alliance, Melbourne, Australia ; Walter and Eliza Hall Institute, Parkville, VIC 3052 Australia.

PMID: 26217397
PMCID: PMC4515933
DOI: 10.1186/s13073-015-0191-x

Cpipe: a shared variant detection pipeline designed for diagnostic settings

Simon P Sadedin et al. Genome Med. 2015.

. 2015 Jul 10;7(1):68.

doi: 10.1186/s13073-015-0191-x. eCollection 2015.

Authors

Affiliations

¹ Murdoch Childrens Research Institute, Royal Children's Hospital, Flemington Road, Parkville, 3052 Australia.
² Murdoch Childrens Research Institute, Royal Children's Hospital, Flemington Road, Parkville, 3052 Australia ; Life Sciences Computation Centre, Victorian Life Sciences Computation Initiative, Faculty of Medicine Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC 3010 Australia.
³ Genetic Medicine, Royal Melbourne Hospital, Parkville, VIC 3052 Australia.
⁴ Population Health and Immunity Division, The Walter and Eliza Hall Institute, Royal Parade, Parkville, VIC 3052 Australia ; Department of Mathematics and Statistics, The University of Melbourne, Melbourne, VIC 3010 Australia ; Department of Medical Biology, The University of Melbourne, Melbourne, VIC 3010 Australia.
⁵ CSIRO, Digital Productivity Flagship, 11 Julius Av, 2113, Sydney, Australia.
⁶ Life Sciences Computation Centre, Victorian Life Sciences Computation Initiative, Faculty of Medicine Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC 3010 Australia.
⁷ Genomic Medicine, Centre for Translational Pathology, Department of Pathology, The University of Melbourne, Melbourne, VIC 3010 Australia.
⁸ Victorian Clinical Genetics Service, Murdoch Childrens Research Institute, Royal Children's Hospital, Flemington Road, Parkville, 3052 Australia ; Melbourne Genomics Health Alliance, Melbourne, Australia.
⁹ CSIRO Food and Nutrition Flagship, North Ryde, NSW 2113 Australia.
¹⁰ Australian Genome Research Facility, The Walter and Eliza Hall Institute, Royal Parade, Parkville, VIC 3050 Australia.
¹¹ Victorian Clinical Genetics Service, Murdoch Childrens Research Institute, Royal Children's Hospital, Flemington Road, Parkville, 3052 Australia.
¹² Victorian Clinical Genetics Service, Murdoch Childrens Research Institute, Royal Children's Hospital, Flemington Road, Parkville, 3052 Australia ; Department of Paediatrics, The University of Melbourne, Melbourne, VIC 3010 Australia.
¹³ Murdoch Childrens Research Institute, Royal Children's Hospital, Flemington Road, Parkville, 3052 Australia ; Genomic Medicine, Centre for Translational Pathology, Department of Pathology, The University of Melbourne, Melbourne, VIC 3010 Australia ; Victorian Clinical Genetics Service, Murdoch Childrens Research Institute, Royal Children's Hospital, Flemington Road, Parkville, 3052 Australia.
¹⁴ Murdoch Childrens Research Institute, Royal Children's Hospital, Flemington Road, Parkville, 3052 Australia ; Victorian Clinical Genetics Service, Murdoch Childrens Research Institute, Royal Children's Hospital, Flemington Road, Parkville, 3052 Australia ; Melbourne Genomics Health Alliance, Melbourne, Australia ; Department of Medicine, The University of Melbourne, Melbourne, VIC 3010 Australia.
¹⁵ Murdoch Childrens Research Institute, Royal Children's Hospital, Flemington Road, Parkville, 3052 Australia ; Department of Medical Biology, The University of Melbourne, Melbourne, VIC 3010 Australia ; Melbourne Genomics Health Alliance, Melbourne, Australia ; Walter and Eliza Hall Institute, Parkville, VIC 3052 Australia.

PMID: 26217397
PMCID: PMC4515933
DOI: 10.1186/s13073-015-0191-x

Abstract

The benefits of implementing high throughput sequencing in the clinic are quickly becoming apparent. However, few freely available bioinformatics pipelines have been built from the ground up with clinical genomics in mind. Here we present Cpipe, a pipeline designed specifically for clinical genetic disease diagnostics. Cpipe was developed by the Melbourne Genomics Health Alliance, an Australian initiative to promote common approaches to genomics across healthcare institutions. As such, Cpipe has been designed to provide fast, effective and reproducible analysis, while also being highly flexible and customisable to meet the individual needs of diverse clinical settings. Cpipe is being shared with the clinical sequencing community as an open source project and is available at http://cpipeline.org.

PubMed Disclaimer

Figures

**Fig. 1**
Batch directory structure used by Cpipe. Each analysis is conducted using a standardised directory structure that separates raw data, design files and generated results from each other. All computed results of the analysis are confined to the ‘analysis’ directory, while source data is kept quarantined in the ‘data’ directory. The analysis directory keeps separate directories for each stage of the analysis starting with initial quality control (fastqc), alignment (align), variant calling (variants) and final quality control (qc). The final analysis results are placed in the ‘results’ directory

**Fig. 2**
Simplified Cpipe analysis steps. Cpipe consists of a number of steps. The core of these are based on the best practice guidelines published by the Broad Institute, consisting of alignment using BWA mem, duplicate removal using Picard MarkDuplicates, local realignment and base quality score recalibration using GATK, and variant calling using GATK HaplotypeCaller. To support clinical requirements, many steps are added including quality control steps (BEDTools coverage and QC summary), additional annotation (Annovar and the Variant Effect Predictor, VEP) and enhanced reports (Annotated variants, Provenance PDF, QC Excel report and Gap Analysis)

**Fig. 3**
Variant and Gene Priority Indexes. Curation of variants is aided by a prioritisation system that ranks variants according to (a) characteristics of the variant including frequency in population databases, conservation scores and the predicted impact on protein product, and (b) the strength of association of the gene to the phenotype under consideration

**Fig. 4**
Overview of Cpipe workflow Cpipe accepts a flexible arrangement of exome or targeted capture samples. Each sample is assigned an Analysis Profile that determines the particular settings and gene list to analyse for that sample. Provenance and QC reports are produced as Excel and PDF files, while variant calls are delivered as both an Excel spreadsheet and a CSV file that is importable to LOVD3. In addition to allele frequencies from population databases, allele frequencies are also annotated from an internal embedded database that automatically tracks local population variants and sequencing artefacts

See this image and copyright information in PMC

References

1. Rehm HL. Disease-targeted sequencing: a cornerstone in the clinic. Nat Rev Genet. 2013;14:295–300. doi: 10.1038/nrg3463. - DOI - PMC - PubMed
1. Fisch KM, Meissner T, Gioia L, Ducom J-C, Carland TM, Loguercio S, et al. Omics Pipe: a community-based framework for reproducible multi-omics data analysis. Bioinformatics. 2015;1–5. - PMC - PubMed
1. bcbio-nextgen - Validated, scalable, community developed variant calling and RNA-seq analysis. Available at: https://github.com/chapmanb/bcbio-nextgen (accessed 31 March 2015).
1. Li J, Doyle M a, Saeed I, Wong SQ, Mar V, Goode DL, et al. Bioinformatics pipelines for targeted resequencing and whole-exome sequencing of human and mouse genomes: a virtual appliance approach for instant deployment. PLoS One. 2014;9 doi: 10.1371/journal.pone.0095217. - DOI - PMC - PubMed
1. Buske FA, French HJ, Smith MA, Clark SJ, Bauer DC. NGSANE: A lightweight production informatics framework for high-throughput data analysis. Bioinformatics. 2014;30:1471–2. doi: 10.1093/bioinformatics/btu036. - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Cpipe: a shared variant detection pipeline designed for diagnostic settings

Affiliations

Cpipe: a shared variant detection pipeline designed for diagnostic settings

Authors

Affiliations

Abstract

Figures

References

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials