Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2016 Dec;37(12):1263-1271.
doi: 10.1002/humu.23114. Epub 2016 Sep 26.

From Wet-Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing

Affiliations
Comparative Study

From Wet-Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing

Steve Laurie et al. Hum Mutat. 2016 Dec.

Abstract

As whole genome sequencing becomes cheaper and faster, it will progressively substitute targeted next-generation sequencing as standard practice in research and diagnostics. However, computing cost-performance ratio is not advancing at an equivalent rate. Therefore, it is essential to evaluate the robustness of the variant detection process taking into account the computing resources required. We have benchmarked six combinations of state-of-the-art read aligners (BWA-MEM and GEM3) and variant callers (FreeBayes, GATK HaplotypeCaller, SAMtools) on whole genome and whole exome sequencing data from the NA12878 human sample. Results have been compared between them and against the NIST Genome in a Bottle (GIAB) variants reference dataset. We report differences in speed of up to 20 times in some steps of the process and have observed that SNV, and to a lesser extent InDel, detection is highly consistent in 70% of the genome. SNV, and especially InDel, detection is less reliable in 20% of the genome, and almost unfeasible in the remaining 10%. These findings will aid in choosing the appropriate tools bearing in mind objectives, workload, and computing infrastructure available.

Keywords: NA12878; NGS; alignment; benchmark; bioinformatics; computing speed; variant calling; whole exome sequencing; whole genome sequencing.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Venn diagrams illustrating concordance of variant identification for BWA‐MEM alignments. Separate Venn diagrams show the number and percentage of concordant calls for a particular variant type by pipeline for the NIST reliably callable regions, the NIST non‐reliably callable but mappable regions, and the NIST non‐reliably callable regions. SNVs, single nucleotide variants; Dels, deletions; Ins, insertions.
Figure 2
Figure 2
Venn diagrams illustrating concordance of variant identification for GEM3 alignments. Separate Venn diagrams show the number and percentage of concordant calls for a particular variant type by pipeline for the NIST reliably callable regions, the NIST non‐reliably callable but mappable regions, and the NIST non‐reliably callable regions. SNVs, single nucleotide variants; Dels, deletions; Ins, insertions.
Figure 3
Figure 3
Venn diagrams illustrating concordance of WGS and WES variant identification. Separate Venn diagrams show the number and percentage of concordant calls for a particular variant type for the GEM3 pipelines for variants identified in the intersection of the NIST reliably callable region and exome capture regions (∼34.7 MB). SNVs, single nucleotide variants.

References

    1. Alioto TS, Buchhalter I, Derdak S, Hutter B, Eldridge MD, Hovig E, Heisler LE, Beck TA, Simpson JT, Tonon L, Sertier AS, Patch AM, et al. 2015. A comprehensive assessment of somatic mutation detection in cancer using whole‐genome sequencing. Nat Commun 6:10001. - PMC - PubMed
    1. Biesecker LG, Green RC. 2014. Diagnostic clinical genome and exome sequencing. N Engl J Med 370:2418–2425. - PubMed
    1. Boeva V, Popova T, Bleakley K, Chiche P, Cappo J, Schleiermacher G, Janoueix‐Lerosey I, Delattre O, Barillot E. 2012. Control‐FREEC: a tool for assessing copy number and allelic content using next‐generation sequencing data. Bioinformatics 28:423–425. - PMC - PubMed
    1. Chen H, Boutros PC. 2011. VennDiagram: a package for the generation of highly‐customizable Venn and Euler diagrams in R. BMC Bioinformatics 12:35. - PMC - PubMed
    1. Cornish A, Guda C. 2015. A comparison of variant calling pipelines using genome in a bottle as a reference. BioMed Res Int 2015:456479. - PMC - PubMed

Publication types