Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Mar 28:15:244.
doi: 10.1186/1471-2164-15-244.

Comparison of somatic mutation calling methods in amplicon and whole exome sequence data

Affiliations

Comparison of somatic mutation calling methods in amplicon and whole exome sequence data

Huilei Xu et al. BMC Genomics. .

Abstract

Background: High-throughput sequencing is rapidly becoming common practice in clinical diagnosis and cancer research. Many algorithms have been developed for somatic single nucleotide variant (SNV) detection in matched tumor-normal DNA sequencing. Although numerous studies have compared the performance of various algorithms on exome data, there has not yet been a systematic evaluation using PCR-enriched amplicon data with a range of variant allele fractions. The recently developed gold standard variant set for the reference individual NA12878 by the NIST-led "Genome in a Bottle" Consortium (NIST-GIAB) provides a good resource to evaluate admixtures with various SNV fractions.

Results: Using the NIST-GIAB gold standard, we compared the performance of five popular somatic SNV calling algorithms (GATK UnifiedGenotyper followed by simple subtraction, MuTect, Strelka, SomaticSniper and VarScan2) for matched tumor-normal amplicon and exome sequencing data.

Conclusions: We demonstrated that the five commonly used somatic SNV calling methods are applicable to both targeted amplicon and exome sequencing data. However, the sensitivities of these methods vary based on the allelic fraction of the mutation in the tumor sample. Our analysis can assist researchers in choosing a somatic SNV calling method suitable for their specific needs.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distribution of variant allele fraction of NA12878 SNV sites. Distribution of NA12878 unique SNV sites allele fraction of (A) amplicon sequencing experimental dilution series of one replicate over the CCP region of interest and (B) exome sequencing in silico dilution series over the exome region of interest. The x-axis represents the variant allele fraction and the y-axis represents the number of sites. Homozygous alternate alleles in NA12878 are shown in red, and heterozygous alternate alleles are shown in blue.
Figure 2
Figure 2
Sensitivity of somatic SNV calling methods. The x-axis represents the methods and the y-axis represents sensitivity for (A) amplicon sequencing data and (B) exome sequencing data. C04, C08, C18, and C50 represent mixing concentrations of 8%, 16%, 36%, and 100% samples in amplicon sequencing data, and median allele fractions of 4%, 8%, 18%, and 50% for NA12878 unique heterozygous SNVs in in silico mixture exome sequencing data over the region of interest. For the amplicon sequencing data, columns represent the mean and error bars represent the standard deviation of the triplicate.
Figure 3
Figure 3
Specificity of somatic SNV calling methods. The x-axis represents the methods and the y-axis represents false positives per Mb over the region of interest, for (A) the amplicon sequencing dilution series triplicate (mean?±?standard deviation); (B) exome sequencing 100% NA12878 sample.
Figure 4
Figure 4
ROC-like curves summarizing sensitivity and specificity of MuTect and Strelka. Sensitivity and FPR (per Mb) plots using various values of MuTect LOD threshold and Strelka QSS_NT threshold generated from one dilution series replicate of amplicon sequencing data. Original thresholds in each final model are marked with black circles (corresponding to outputs in Figure 2).

References

    1. Network TCGAR. Comprehensive genomic characterization of squamous cell lung cancers. Nature. 2012;15:519–525. doi: 10.1038/nature11404. - DOI - PMC - PubMed
    1. Network TCGAR. Comprehensive molecular portraits of human breast tumours. Nature. 2012;15:61–70. doi: 10.1038/nature11412. - DOI - PMC - PubMed
    1. Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010;15:31–46. doi: 10.1038/nrg2626. - DOI - PubMed
    1. Welch JS, Ley TJ, Link DC, Miller CA, Larson DE, Koboldt DC, Wartman LD, Lamprecht TL, Liu F, Xia J, Kandoth C, Fulton RS, McLellan MD, Dooling DJ, Wallis JW, Chen K, Harris CC, Schmidt HK, Kalicki-Veizer JM, Lu C, Zhang Q, Lin L, O'Laughlin MD, McMichael JF, Delehaunty KD, Fulton LA, Magrini VJ, McGrath SD, Demeter RT, Vickery TL. et al.The origin and evolution of mutations in acute myeloid leukemia. Cell. 2012;15:264–278. doi: 10.1016/j.cell.2012.06.023. - DOI - PMC - PubMed
    1. Green MR, Gentles AJ, Nair RV, Irish JM, Kihira S, Liu CL, Kela I, Hopmans ES, Myklebust JH, Ji H, Plevritis SK, Levy R, Alizadeh AA. Hierarchy in somatic mutations arising during genomic evolution and progression of follicular lymphoma. Blood. 2013;15:1604–1611. doi: 10.1182/blood-2012-09-457283. - DOI - PMC - PubMed

LinkOut - more resources