Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Dec 15;18(1):555.
doi: 10.1186/s12859-017-1950-z.

Canary: an atomic pipeline for clinical amplicon assays

Affiliations

Canary: an atomic pipeline for clinical amplicon assays

Kenneth D Doig et al. BMC Bioinformatics. .

Abstract

Background: High throughput sequencing requires bioinformatics pipelines to process large volumes of data into meaningful variants that can be translated into a clinical report. These pipelines often suffer from a number of shortcomings: they lack robustness and have many components written in multiple languages, each with a variety of resource requirements. Pipeline components must be linked together with a workflow system to achieve the processing of FASTQ files through to a VCF file of variants. Crafting these pipelines requires considerable bioinformatics and IT skills beyond the reach of many clinical laboratories.

Results: Here we present Canary, a single program that can be run on a laptop, which takes FASTQ files from amplicon assays through to an annotated VCF file ready for clinical analysis. Canary can be installed and run with a single command using Docker containerization or run as a single JAR file on a wide range of platforms. Although it is a single utility, Canary performs all the functions present in more complex and unwieldy pipelines. All variants identified by Canary are 3' shifted and represented in their most parsimonious form to provide a consistent nomenclature, irrespective of sequencing variation. Further, proximate in-phase variants are represented as a single HGVS 'delins' variant. This allows for correct nomenclature and consequences to be ascribed to complex multi-nucleotide polymorphisms (MNPs), which are otherwise difficult to represent and interpret. Variants can also be annotated with hundreds of attributes sourced from MyVariant.info to give up to date details on pathogenicity, population statistics and in-silico predictors.

Conclusions: Canary has been used at the Peter MacCallum Cancer Centre in Melbourne for the last 2 years for the processing of clinical sequencing data. By encapsulating clinical features in a single, easily installed executable, Canary makes sequencing more accessible to all pathology laboratories. Canary is available for download as source or a Docker image at https://github.com/PapenfussLab/Canary under a GPL-3.0 License.

Keywords: Amplicon; Canary; Clinical diagnostics; PathOS; Pipelines; Targeted sequencing; Variant calling.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Canary read alignment. Overlapping amplicon reads are aligned to the reference genome in a two step process. The overlapping read pairs, that are derived from the same DNA molecule, are aligned to each other to form a single consensus merged read which is then aligned to a reference genome to identify variants
Fig. 2
Fig. 2
Normalised variants displayed in IGV. IGV display of Illumina MiSeq reads from a clinical patient highlighting the variation in the representation of indels within BAM files. The same variant is represented differently in three sets of reads which need to be merged to a single locus with the standardized HGVS nomenclature of NM_000314.4:c.21_22dup. Additionally, the reads contributing to the three read sets must be combined to calculate the correct variant allele frequency
Fig. 3
Fig. 3
Comparison of Canary with BWA, GATK and VarDict. Graph showing the number of true positive (TP) variants (expected = 46) for three pipelines run against six Acrometrix control samples containing known variants at a certified allele frequency (left hand axis). The three pipelines were; Canary performing read alignment and variant calling (blue bars), BWA-MEM 2 performing read alignment and GATK haplotype caller for variant calling (red bars) and BWA-MEM 2 performing read alignment and VarDict for variant calling (green bars). The mean variant allele frequency for each of the pipeline variants is shown as coloured diamonds and the control sample expected frequency is shown as black diamonds (right hand axis). Raw data and statistics are available in Additional file 2: Table S1

References

    1. Doig K, Papenfuss AT, Fox S. Clinical cancer genomic analysis: data engineering required. The Lancet Oncology. 2015;16:1015–1017. doi: 10.1016/S1470-2045(15)00195-3. - DOI - PubMed
    1. Docker. Docker containerisation site, http:/http://www.docker.com. Accessed 29 Nov 2017.
    1. Park DJ, et al. UNDR ROVER - a fast and accurate variant caller for targeted DNA sequencing. BMC bioinformatics. 2016;17:165. doi:10.1186/s12859-016-1014-9. Accessed 29 Nov 2017. - PMC - PubMed
    1. Yost SE, et al. Mutascope: sensitive detection of somatic mutations from deep amplicon sequencing. Bioinformatics. 2013;29:1908–1909. doi: 10.1093/bioinformatics/btt305. - DOI - PMC - PubMed
    1. Illumina. https://basespace.illumina.com.