Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jun 20;44(11):e108.
doi: 10.1093/nar/gkw227. Epub 2016 Apr 7.

VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research

Affiliations

VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research

Zhongwu Lai et al. Nucleic Acids Res. .

Abstract

Accurate variant calling in next generation sequencing (NGS) is critical to understand cancer genomes better. Here we present VarDict, a novel and versatile variant caller for both DNA- and RNA-sequencing data. VarDict simultaneously calls SNV, MNV, InDels, complex and structural variants, expanding the detected genetic driver landscape of tumors. It performs local realignments on the fly for more accurate allele frequency estimation. VarDict performance scales linearly to sequencing depth, enabling ultra-deep sequencing used to explore tumor evolution or detect tumor DNA circulating in blood. In addition, VarDict performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings, and is able to detect PCR artifacts. Finally, VarDict also detects differences in somatic and loss of heterozygosity variants between paired samples. VarDict reprocessing of The Cancer Genome Atlas (TCGA) Lung Adenocarcinoma dataset called known driver mutations in KRAS, EGFR, BRAF, PIK3CA and MET in 16% more patients than previously published variant calls. We believe VarDict will greatly facilitate application of NGS in clinical cancer research.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
VarDict uses soft-clipped reads for local realignment to comprehensively estimate allele frequency (AF). This example shows the 15-bp deletion mutation in EGFR exon 19 in the PC-9 lung cancer cell line, as shown in IGV. Top track is the coverage for each base pair. Each thin gray line represents a sequence read. Black lines in the middle indicate gapped alignments due to the 15-bp deletion. The colored portion shows soft-clipped reads that cannot be aligned due to short overhangs. The bottom track shows reference sequence and amino acids for EGFR exon 19.
Figure 2.
Figure 2.
VarDict calls a large 124-bp deletion in NA12878. The deletion has clear support from both soft-clipped reads (colored reads) at both breakpoints and the apparent drop of the coverage illustrated in the top track. The apparent consensus of the clipped sequences indicates the existence of an relatively large InDel. Dark colored short reads are supplementary alignments from split reads, where individual reads are split into two segments that are aligned at the edges flanking the deletion. The deletion variant was further supported by the existence of an entry in dbSNP (rs67488720). This deletion was detected by VarDict, but not by GATK, VarScan, or FreeBayes.
Figure 3.
Figure 3.
VarDict calls a new class of complex variants. The clear drop out in coverage (top track) indicates a deletion from a complex variant example called in NA12878 at chr12:51740388. Only the end portion of soft-clipped sequences, indicated by red arrows, can be aligned to the other side of the deletion breakpoint, suggesting an additional proximal InDel may be present contributing to a complex composite variant. VarDict calls one single homozygous complex variant, comprising of a 29-bp deletion followed by a 13-bp insertion (CTGGACCATATCCACTTACCATAAAGGAC>ACACCAGGAAGCG). This is further supported by a recent entry in dbSNP (rs386762976). Many clustered dbSNP entries within the gap from dbSNP138 (bottom track), are likely from mis-interpretation of mis-alignments.
Figure 4.
Figure 4.
Complex variants can impact clinical interpretation. For this example of EGFR exon 19 deletion from a lung cancer patient (15), BWA produced different alignments for reads with different lengths, some with two deletions and on insertion, while others are soft-clipped. The misalignment can be incorrectly interpreted as insertion of a single base of C followed by an out of frame deletion. VarDict correctly calls a single complex mutation comprising of a 26-bp deletion and a 5-bp insertion (TTAAGAGAAGCAACATCTCCGAAAGC>GCCAA), which explains all alignments, including soft-clipped reads. The mutation will be thus annotated as an in-frame deletion of exon 19, which would be clinically actionable for EGFR inhibitor therapy.
Figure 5.
Figure 5.
The distribution of allele frequencies for structural variants estimated by VarDict. About 2407 high confidence large deletions from NA12878 are called by VarDict. The x-axis shows the AF estimated by VarDict while the y-axis shows the number of variants for a given AF. Two expected peaks at 50 and 100% are visible, consistent with the germline origin of the reference NA12878 sample.
Figure 6.
Figure 6.
Receiver operating characteristic (ROC) curve for comparison of variant callers on DREAM synthetic dataset #4. The ROC curve is drawn using quality scores for calls of somatic SNV and InDels provided by FreeBayes, VarScan, MuTect and VarDict. MuTect does not report quality and depth was used in its place. There are total 21 913 of synthetic somatic SNV and InDel mutations evaluated. VarDict outperforms other callers with higher sensitivity and specificity. Variants were called and filtered using the default setting of each caller. It is worth noting that MuTect does not call InDels.
Figure 7.
Figure 7.
The comparison of VarDict and Firehose calls for KRAS, EGFR, BRAF, PIK3CA and MET in 230 TCGA LUAD patients. Each column represents a patient. Each gene has two rows, with the top showing calls from VarDict and the bottom showing calls from Firehose. Different colors indicate different mutation types. Patients without matches in Firehose tracks indicate the mutations are only called by VarDict. As expected, all but one KRAS mutations are missense, while EGFR has known in-frame InDels. The patient with a truncating mutation of KRAS also contains an activating G12V mutation, suggesting the heterogeneous nature of the sample. Dark gray indicates mutations that are deemed as VUS (variant of unknown significance). Trunc: Truncation; FS: Frameshift.
Figure 8.
Figure 8.
Common artifacts from PCR based target enrichment. (A) Amplicon biased variants. Four overlapping PCR amplicons were designed against EGFR exon 12. The red arrow highlights a variant with AF of 24% and predicted to be C499Y, which has an entry is COSMIC. However, it is detectable only in amplicon 3 and will be flagged by VarDict and filtered. (B) Mis-paired primers amplified a region with EGFR exon 20. The read pairs highlighted by a red rectangle can not be mapped to any of the amplicons (5-7) below. The two mismatches at the left are actually a primer from ERBB2, which has high sequence similarity to EGFR resulting in primer mis-pairing. VarDict will filter out all those reads and thus no variant will be called from those mismatches in the mis-paired primer. Numbers (1-7) at the bottom indicate PCR amplicons, with thick middle portion for inserts and thin edges for PCR primers.

References

    1. TCGA. Integrated genomic analysis of ovarian carcinoma. Nature. 2011;474:609–615. - PMC - PubMed
    1. Frampton G.M., Fichtenholtz A., Otto G.A., Wang K., Downing S.R., He J., Schnall-Levin M., White J., Sanford E.M., An P., et al. Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing. Nat. Biotechnol. 2013;31:1023–1031. - PMC - PubMed
    1. McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., Garimella K., Altshuler D., Gabriel S., Daly M., et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. - PMC - PubMed
    1. Koboldt D.C., Zhang Q., Larson D.E., Shen D., Mclellan M.D., Lin L., Miller C.A, Mardis E.R, Ding L., Wilson R.K. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 2012;22:568–576. - PMC - PubMed
    1. Forbes S.A., Beare D., Gunasekaran P., Leung K., Bindal N., Boutselakis H., Ding M., Bamford S., Cole C., Ward S., et al. COSMIC: exploring the world's knowledge of somatic mutations in human cancer. Nucleic Acids Res. 2015;43:D805–D811. - PMC - PubMed

Publication types