Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Oct 11;12(10):e0186175.
doi: 10.1371/journal.pone.0186175. eCollection 2017.

Comprehensive benchmarking of SNV callers for highly admixed tumor data

Affiliations

Comprehensive benchmarking of SNV callers for highly admixed tumor data

Regina Bohnert et al. PLoS One. .

Abstract

Precision medicine attempts to individualize cancer therapy by matching tumor-specific genetic changes with effective targeted therapies. A crucial first step in this process is the reliable identification of cancer-relevant variants, which is considerably complicated by the impurity and heterogeneity of clinical tumor samples. We compared the impact of admixture of non-cancerous cells and low somatic allele frequencies on the sensitivity and precision of 19 state-of-the-art SNV callers. We studied both whole exome and targeted gene panel data and up to 13 distinct parameter configurations for each tool. We found vast differences among callers. Based on our comprehensive analyses we recommend joint tumor-normal calling with MuTect, EBCall or Strelka for whole exome somatic variant calling, and HaplotypeCaller or FreeBayes for whole exome germline calling. For targeted gene panel data on a single tumor sample, LoFreqStar performed best. We further found that tumor impurity and admixture had a negative impact on precision, and in particular, sensitivity in whole exome experiments. At admixture levels of 60% to 90% sometimes seen in pathological biopsies, sensitivity dropped significantly, even when variants were originally present in the tumor at 100% allele frequency. Sensitivity to low-frequency SNVs improved with targeted panel data, but whole exome data allowed more efficient identification of germline variants. Effective somatic variant calling requires high-quality pathological samples with minimal admixture, a consciously selected sequencing strategy, and the appropriate variant calling tool with settings optimized for the chosen type of data.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: All authors are affiliated with Molecular Heath GmbH. This does not alter our adherence to PLOS ONE policies on sharing data and materials.

Figures

Fig 1
Fig 1. Schematic overview of “gold standard” variants in the simulated data set.
Moving from outer to inner circle, the circles show chromosomes, genomic regions covered in the exome experiments (dark blue), genomic regions in the panel regions (light blue), density of germline and somatic SNVs combined (dark green; maximum of scale at 3,000), density of somatic SNVs (green; maximum at 30), density of germline SNVs (light green; maximum at 3,000), density of germline and somatic indels (dark orange; maximum at 300), density of somatic indels (orange; maximum at 30), and density of germline indels (light orange; maximum at 300). Variant densities were computed in 1 Mb bins.
Fig 2
Fig 2. Histograms of true allele frequencies in each tumor sample.
Note how increasing admixture increases the prevalence of low-frequency variants.
Fig 3
Fig 3. Benchmarking results for germline SNVs.
Sensitivity versus precision is shown for A. exome and B. targeted gene panel data.
Fig 4
Fig 4. Benchmarking results for somatic SNVs on exome data.
A and C. Sensitivity; B and D. precision for somatic SNVs. A, B. on paired tumor-control exome data; C, D. on single tumor exome data.
Fig 5
Fig 5. Benchmarking results for somatic SNVs on targeted gene panel data.
A and C. Sensitivity; B and D. precision for somatic SNVs. A, B. on paired tumor-control targeted gene panel data; C, D. on single tumor targeted gene panel data.
Fig 6
Fig 6. Sensitivity of LoFreqStar, VarDict and VarScan on the GiaB reference samples averaged over the four replicates.
Because of the 1:7 mixtures, allele frequencies are discrete for the given values.

References

    1. Abrahams E, Ginsburg GS, Silver M. The Personalized Medicine Coalition: Goals and Strategies. American Journal of Pharmacogenomics: Genomics-Related Research in Drug Development and Clinical Practice. 2005;5(6):345–355. 10.2165/00129785-200505060-00002 - DOI - PubMed
    1. Ginsburg GS, Willard HF. Genomic and Personalized Medicine: Foundations and Applications. Translational Research: The Journal of Laboratory and Clinical Medicine. 2009;154(6):277–287. 10.1016/j.trsl.2009.09.005 - DOI - PubMed
    1. Auffray C, Chen Z, Hood L. Systems Medicine: The Future of Medical Genomics and Healthcare. Genome Medicine. 2009;1(1):2 10.1186/gm2 - DOI - PMC - PubMed
    1. Fernald GH, Capriotti E, Daneshjou R, Karczewski KJ, Altman RB. Bioinformatics Challenges for Personalized Medicine. Bioinformatics (Oxford, England). 2011;27(13):1741–1748. 10.1093/bioinformatics/btr295 - DOI - PMC - PubMed
    1. Hamburg MA, Collins FS. The Path to Personalized Medicine. The New England Journal of Medicine. 2010;363(4):301–304. 10.1056/NEJMp1006304 - DOI - PubMed

LinkOut - more resources