Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep;39(9):1141-1150.
doi: 10.1038/s41587-021-00994-5. Epub 2021 Sep 9.

Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing

Wenming Xiao #  1 Luyao Ren #  2 Zhong Chen  3 Li Tai Fang  4 Yongmei Zhao  5 Justin Lack  5 Meijian Guan  6 Bin Zhu  7 Erich Jaeger  8 Liz Kerrigan  9 Thomas M Blomquist  10 Tiffany Hung  11 Marc Sultan  12 Kenneth Idler  13 Charles Lu  13 Andreas Scherer  14   15 Rebecca Kusko  16 Malcolm Moos  17 Chunlin Xiao  18 Stephen T Sherry  18 Ogan D Abaan  8   19 Wanqiu Chen  3 Xin Chen  3 Jessica Nordlund  15   20 Ulrika Liljedahl  15   21 Roberta Maestro  15   21 Maurizio Polano  15   21 Jiri Drabek  15   22 Petr Vojta  15   22 Sulev Kõks  15   23   24 Ene Reimann  15   25 Bindu Swapna Madala  26 Timothy Mercer  26 Chris Miller  13 Howard Jacob  13 Tiffany Truong  8 Ali Moshrefi  8 Aparna Natarajan  8 Ana Granat  8 Gary P Schroth  8 Rasika Kalamegham  11 Eric Peters  11 Virginie Petitjean  12 Ashley Walton  5 Tsai-Wei Shen  5 Keyur Talsania  5 Cristobal Juan Vera  5 Kurt Langenbach  9 Maryellen de Mars  9 Jennifer A Hipp  10 James C Willey  10 Jing Wang  27 Jyoti Shetty  28 Yuliya Kriga  28 Arati Raziuddin  28 Bao Tran  28 Yuanting Zheng  2 Ying Yu  2 Margaret Cam  29 Parthav Jailwala  29 Cu Nguyen  30 Daoud Meerzaman  30 Qingrong Chen  30 Chunhua Yan  30 Ben Ernest  31 Urvashi Mehra  31 Roderick V Jensen  32 Wendell Jones  33 Jian-Liang Li  34 Brian N Papas  34 Mehdi Pirooznia  35 Yun-Ching Chen  35 Fayaz Seifuddin  35 Zhipan Li  36 Xuelu Liu  37 Wolfgang Resch  37 Jingya Wang  38 Leihong Wu  39 Gokhan Yavas  39 Corey Miles  39 Baitang Ning  39 Weida Tong  39 Christopher E Mason  40 Eric Donaldson  41 Samir Lababidi  42 Louis M Staudt  43 Zivana Tezak  44 Huixiao Hong  39 Charles Wang  45 Leming Shi  46
Affiliations

Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing

Wenming Xiao et al. Nat Biotechnol. 2021 Sep.

Abstract

Clinical applications of precision oncology require accurate tests that can distinguish true cancer-specific mutations from errors introduced at each step of next-generation sequencing (NGS). To date, no bulk sequencing study has addressed the effects of cross-site reproducibility, nor the biological, technical and computational factors that influence variant identification. Here we report a systematic interrogation of somatic mutations in paired tumor-normal cell lines to identify factors affecting detection reproducibility and accuracy at six different centers. Using whole-genome sequencing (WGS) and whole-exome sequencing (WES), we evaluated the reproducibility of different sample types with varying input amount and tumor purity, and multiple library construction protocols, followed by processing with nine bioinformatics pipelines. We found that read coverage and callers affected both WGS and WES reproducibility, but WES performance was influenced by insert fragment size, genomic copy content and the global imbalance score (GIV; G > T/C > A). Finally, taking into account library preparation protocol, tumor content, read coverage and bioinformatics processes concomitantly, we recommend actionable practices to improve the reproducibility and accuracy of NGS experiments for cancer mutation detection.

PubMed Disclaimer

Conflict of interest statement

Competing interests

L.F. was an employees of Roche Sequencing Solutions Inc. L.K., K.L. and M.M. are employees of ATCC, which provided cell lines and derivative materials. E.J., O.D.A., T.T., A.M., A.N., A.G. and G.P.S. are employees of Illumina Inc. V.P. and M.S. are employees of Novartis Institutes for Biomedical Research. T.H., E.P and R. Kalamegham are employees of Genentech (a member of the Roche group). Z.L. is an employee of Sentieon Inc. R.K. is an employee of Immuneering Corp. C.E.M. is a cofounder of Onegevity Health. All other authors claim no competing interests.

Figures

Extended Data Fig. 1 |
Extended Data Fig. 1 |. Study design to capture “wet lab” factors affecting sequencing quality.
DNA was extracted from either fresh cells or FFPE processed cells (formalin fixation time of 1, 2, 6, or 24 hours). Both fresh DNA and FFPE DNA were profiled on WGS and WES platforms. For fresh DNA, six centers (Fudan University (FD), Illumina (IL), Novartis (NV), European Infrastructure for Translational Medicine (EA), National Cancer Institute (NC), and Loma Linda University (LL)) performed WGS and WES in parallel following manufacturer recommended protocols with limited deviation. Three of the six sequencing centers (FD, IL, and NV) generated library preparation in triplicate. For FFPE samples, each fixation time point had six blocks that were sequenced at two different centers (IL and GeneWiz (GZ)). Three library preparation protocols (TruSeq PCR-free, TruSeq-Nano, and Nextera Flex) were used with four different quantities of DNA input (1, 10, 100, and 250 ng) and sequenced by IL and LL. DNAs from HCC1395 and HCC1395BL were pooled at various ratios to create mixtures of 75%, 50%, 20%, 10%, and 5%. All libraries from these experiments were sequenced in triplicate on the HiSeq series by Genentech (GT). In addition, nine libraries using the TruSeq PCR-free preparation were run on a NovaSeq for WGS analysis by IL. Sample naming convention (example: WGS_FD_N_1): First field was used for sequencing study: Whole genome sequencing (WGS), Whole exome sequencing (WES), WGS on FFPE sample (FFG), WES on FFPE sample (FFX), WGS on library preparation protocol (LBP), WGS on tumor purity (SPP); Second field was used for sequencing centers, EA, FD, IL, LL, NC, NV, GT, and GZ or sequencing technologies, HiSeq (HS) and NovaSeq (NS); Third field was used for tumor (T) or normal (N); The last field was used for the number of repeats. *WGS performed only on Mixture (tumor purity) samples. ** WGS and WES performed only on FFPE samples.
Extended Data Fig. 2 |
Extended Data Fig. 2 |. Read mapping quality statistics.
(a) Percentage of reads mapped to target regions (SureSelect V6 + UTR) and G/C content for WES runs on fresh or FFPE DNA. (b) Read quality from three WGS library preparation kits (TruSeq PCRfree, TruSeq-Nano, and Nextera Flex) on fresh or FFPE DNA. (c) Distribution of GIV scores in WGS and WES runs. For detailed statistics regarding the boxplot, please refer to Supplementary Table 5.
Extended Data Fig. 3 |
Extended Data Fig. 3 |. Overall read quality distribution for all WES and WGS runs.
(a) Median insert fragment size of WES and WGS run on fresh and FFPE DNA. (b) G/C read content for Wes and WGS runs. (c) Overall read redundancy for WES and WGS runs. Some outliers were observed in WGS on fresh DNA, which were from runs of TruSeq-Nano with 1 ng of DNA input. (d) Overall percentage of reads mapped to target regions for WES runs for fresh and FFPE DNA. For detailed statistics regarding the boxplot, please refer to Supplementary Table 6.
Extended Data Fig. 4 |
Extended Data Fig. 4 |. Mutation calling repeatability and O_Score distribution.
(a) Distribution of O_Score of three callers (MuTect2, Strelka2, and SomaticSniper) for twelve WGS and WES runs on BWA alignments. For detailed statistics regarding the boxplot, please refer to Supplementary Table 7. (b) “Tornado” plot of reproducibility between twelve WGS runs on the HiSeq series (2500, 4000, and X10) and nine WGS runs on the NovaSeq (S6000). SNVs/indels were called by Strelka2 on BWA alignments.
Extended Data Fig. 5 |
Extended Data Fig. 5 |. Source of variance in reproducibility measured by O_Score.
Actual by predicted plot of WGS (a) and WES (b). A total of 8 variables (WGS) or 13 variables (WES), including 2-degree interactions, were included in the fixed effect linear model. 36 samples were used to derive statistics for both WES and WGS. The central blue line is the mean. The shaded region represents the 95% confidence interval.
Extended Data Fig. 6 |
Extended Data Fig. 6 |. Effect of post alignment processing on precision and recall of WES and WGS run on FFPE DNA.
(a) precision and recall of mutation calls by Strelka2 on BWA alignments. A single library of FFPE DNA (FFX) and three libraries of fresh DNA (EA_1, FD_1, and NV_1) were run on a WES platform. Resulting reads were either processed by the BFC tool or by Trimmomatic. processed FASTQ files were then aligned by BWA and called by Strelka2. precision and recall were derived by matching calling results with the truth set. (b) precision and recall of mutation calls by three callers, Mutect2 (blue), Strelka2 (green), and SomaticSniper (red), on BWA alignments without or with GATK post alignment process (indel realignment & BQSR).
Extended Data Fig. 7 |
Extended Data Fig. 7 |. Jaccard index scores to measure reproducibility of SNVs called by three callers.
Box plot of Jaccard scores of inter-center, intra-center, and overall pair of SNV call sets from two WGS or WES runs. SNVs were divided into three groups; Repeatable: SNVs defined in the truth set of the reference call set; Gray zone: SNVs not defined as “truth” in the reference call set; Non-Repeatable: SNVs were not in the reference call set. For detailed statistics regarding the boxplot, please refer to Supplementary Table 8.
Extended Data Fig. 8 |
Extended Data Fig. 8 |. Sources of variation in Jaccard index.
(a) Summary of factor effects. Twenty-five factors, including five original factors, ten 2-way interactions, and ten 3-way interactions were evaluated in the model. Both P values (derived from F-test) and their LogWorth (−log10 (P value)) are included in the summary plot. The factors are ordered by their LogWorth values. (b) Least square means of caller*pair_group*platform interaction. The height of the markers represents the adjusted least square means, and the bars represent confidence intervals of the means. (c) Least square means SNV_subset*pair_group*platform interaction. The height of the markers represents the adjusted least square means, and the bars represent confidence intervals of the means. 3168 samples were used to derive these statistics. (d) Student’s t-test for platform*pair_group interaction with SNV calls from three callers, MuTect2, Strelka2, and SomaticSniper. The left two panels compare Jaccard indices between intra-center and inter-center for WGS and WES, respectively. The right two panels compare Jaccard indices between WGS and WES for inter-center and intra-center pairs, respectively. Prob > |t| is the two-tailed test P value, and Prob > t is the one-tailed test P value.
Extended Data Fig. 9 |
Extended Data Fig. 9 |. WGS vs. WES platform-specific mutations and allele frequency calling accuracy.
Cumulative VAF plot of precision (a), recall (b), and F-Score (c) for three callers (MuTect2, Strelka2, and SomaticSniper) on WES and WGS runs.
Extended Data Fig. 10 |
Extended Data Fig. 10 |. Mutation allele frequency and coverage depth in WES and WGS sample.
Scatter plot of allele frequency and coverage depth by three callers, MuTect2, Strelka2, and SomaticSniper in one example WES sample (a) or WGS sample (b). (c) Boxplot of read depth on called mutations in WES or WGS. For detailed statistics regarding the boxplot, please refer to Supplementary table 9.
Fig. 1 |
Fig. 1 |. Study design and read quality.
a, Study design used to capture nonanalytical and analytical factors affecting cancer mutation detection. DNA was extracted from either fresh cells or FFPE-processed cells and fragmented at three intended sizes. Libraries with various levels of DNA input (either from random shotgun or exome capture) were generated with three different library preparation kits and run on WGS and WES in parallel following recommended protocols (Methods). Twelve replicates were performed at six sequencing centers: three centers (FD, IL and NV) prepared WGS and WES libraries in triplicate; three centers (EA, LL and NC) prepared a single WGS and WES library (3×3 + 3); and 144 libraries were sequenced on either a HiSeq or NovaSeq instrument. Two prealignments (BFC and Trimmomatic), three alignments (BWA, Bowtie and NovoAlign) and two postalignments (GATK and no-GATK) were evaluated. A total of 1,015 mutation call sets were generated. Numbers in parentheses represent possible combinations at that level. Further details on the experiment design are given in Extended Data Fig. 1. b, Read yields (blue), mapping statistics (red) and genome coverage (yellow line) from 12 repeated WGS runs. c, GIV of G > T/C > A and T > /A > C mutation pairs in WES and WGS runs. Six centers used a range of time spans (80–300 s) for DNA shearing. As a result, average insert DNA fragment size ranged from 161 to 274 bp. d, Distribution of GIV score for FFPE DNA with four different fixation times (1, 2, 6 and 24 h) analyzed with WES or WGS: FFX and FFPE on WES platform and FFG and FFPE on WGS platform. Box-and-whisker plots shows the first and third quartiles as well as median values. The upper and lower whiskers extend from the hinge to the largest or smallest value no further than 1.5 × interquartile range from the hinge. For detailed statistics regarding minima, maxima, center, bounds of box and whiskers, and percentiles related to this figure, please refer to Supplementary Table 4.
Fig. 2 |
Fig. 2 |. Mutation calling reproducibility.
a, In mutation-calling reproducibility, SNV overlaps across 108 VCF results from 12 repeated WGS and WES runs analyzed with three aligners (BWA, Bowtie2 and NovoAlign) and three callers (MuTect2, Strelka2 and SomaticSniper). b, SNV overlaps across 36 VCF results from 12 repeated WGS and WES runs as analyzed by MuTect2, Strelka2 and SomaticSniper from only BWA alignments. c, SNV overlaps in each of the 12 repeated WES and WGS runs, analyzed by Strelka2 from BWA alignments. The y axis shows that the probability of SNVs that were missed is equal to, or less than, the percentage of the 12 call sets depicted on the x axis. d, Effect summary of the model for WES or WGS. Effect tests were performed to evaluate the importance of each independent variable in a fixed-effect linear model fitted for WES or WGS. F-statistics and corresponding P values were calculated for variables. Both P values and LogWorth (−log10 (P values)) are plotted. The lower-order effects are identified with a caret. The sample size used to derive statistics was 36 for both WES and WGS.
Fig. 3 |
Fig. 3 |. Nonanalytical factors affecting mutation calling.
a, Caller performance on three library preparation protocols with different DNA Inputs: 1, 10, 100, 250 and 1,000 ng. WGS sequencing on TruSeq and TruSeq-Nano libraries was performed at LL, while WGS sequencing on Nextera libraries was performed at IL. All sequencing experiments were performed with HiSeq 4000 and analyzed using three aligners (BWA, Bowtie2 and NovoAlign) and three callers (MuTect2, Strelka2 and SomaticSniper). b, Performance of MuTect2, Strelka2 and SomaticSniper on WGS with fresh DNA or FFPE DNA (24 h) from a BWA alignment.
Fig. 4 |
Fig. 4 |. Bioinformatics for enhanced calling.
a, Distribution of mutation types called with Strelka2 on BWA alignments of four WES runs preprocessed by Trimmomatic or BFC. WES run on FFPE DNA (FFPE) or fresh DNA (EA_1, FD_1 and NV_1). Numbers of SNVs called from each process are shown at the top. Mutations shared across BFC datasets (overlap.BFC) and Trimmomatic datasets (overlap.trimm) are shown on the left. C > A and T > C artifacts were observed in the Trimmomatic and BFC datasets, respectively; both artifacts were minimized with repeats. b, Performance of mutation calling by Strelka2 on three alignments (Bowtie2, BWA and NovoAlign). Insert is a violin plot of mapping quality (MAPQ) scores from three alignments for an example WGS run. In total, 81 billion, 118 billion and 140 billion data points were used in violin plots for Bowtie2, BWA and NovoAlign, respectively. c, Effect of postalignment processing (indel realignment + BQSR) on mutation calling by MuTect2, Strelka2 and SomaticSniper). d, Effect of tumor purity (20 versus 50%) on five callers (Lancet, MuTect2, Strelka2, TNscope and SomaticSniper) with read coverage of 10×, 30×, 50×, 80×, 100×, 200× and 300×.
Fig. 5 |
Fig. 5 |. Biological repeats versus analytical repeats.
Precision (a) and recall (b) of overlapping SNVs/indels that were supported by biological repeats (library repeats) or analytical repeats (two different callers). Each row or column represents calling results from a WES or WGS run called by one of the three callers from a BWA alignment. All 12 repeats of WES and WGS from six sequencing centers (FD, IL, NV, EA, LL and NC) were included.

References

    1. Glasziou P, Meats E, Heneghan C & Shepperd S What is missing from descriptions of treatment in trials and reviews? Brit. Med. J 336, 1472–1474 (2008). - PMC - PubMed
    1. Vasilevsky NA et al. On the reproducibility of science: unique identification of research resources in the biomedical literature. PeerJ 1, e148 (2013). - PMC - PubMed
    1. Begley CG & Ellis LM Drug development: raise standards for preclinical cancer research. Nature 483, 531–533 (2012). - PubMed
    1. Alioto TS et al. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat. Commun 6, 10001 (2015). - PMC - PubMed
    1. Griffith M et al. Genome Modeling System: a knowledge management platform for genomics. PLoS Comput. Biol 11, e1004274 (2015). - PMC - PubMed

Publication types