. 2021 Sep;39(9):1141-1150.

doi: 10.1038/s41587-021-00994-5. Epub 2021 Sep 9.

Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing

Wenming Xiao^#¹, Luyao Ren^#², Zhong Chen³, Li Tai Fang⁴, Yongmei Zhao⁵, Justin Lack⁵, Meijian Guan⁶, Bin Zhu⁷, Erich Jaeger⁸, Liz Kerrigan⁹, Thomas M Blomquist¹⁰, Tiffany Hung¹¹, Marc Sultan¹², Kenneth Idler¹³, Charles Lu¹³, Andreas Scherer^{14

15}, Rebecca Kusko¹⁶, Malcolm Moos¹⁷, Chunlin Xiao¹⁸, Stephen T Sherry¹⁸, Ogan D Abaan^{8

19}, Wanqiu Chen³, Xin Chen³, Jessica Nordlund^{15

20}, Ulrika Liljedahl^{15

21}, Roberta Maestro^{15

21}, Maurizio Polano^{15

21}, Jiri Drabek^{15

22}, Petr Vojta^{15

22}, Sulev Kõks^{15

23

24}, Ene Reimann^{15

25}, Bindu Swapna Madala²⁶, Timothy Mercer²⁶, Chris Miller¹³, Howard Jacob¹³, Tiffany Truong⁸, Ali Moshrefi⁸, Aparna Natarajan⁸, Ana Granat⁸, Gary P Schroth⁸, Rasika Kalamegham¹¹, Eric Peters¹¹, Virginie Petitjean¹², Ashley Walton⁵, Tsai-Wei Shen⁵, Keyur Talsania⁵, Cristobal Juan Vera⁵, Kurt Langenbach⁹, Maryellen de Mars⁹, Jennifer A Hipp¹⁰, James C Willey¹⁰, Jing Wang²⁷, Jyoti Shetty²⁸, Yuliya Kriga²⁸, Arati Raziuddin²⁸, Bao Tran²⁸, Yuanting Zheng², Ying Yu², Margaret Cam²⁹, Parthav Jailwala²⁹, Cu Nguyen³⁰, Daoud Meerzaman³⁰, Qingrong Chen³⁰, Chunhua Yan³⁰, Ben Ernest³¹, Urvashi Mehra³¹, Roderick V Jensen³², Wendell Jones³³, Jian-Liang Li³⁴, Brian N Papas³⁴, Mehdi Pirooznia³⁵, Yun-Ching Chen³⁵, Fayaz Seifuddin³⁵, Zhipan Li³⁶, Xuelu Liu³⁷, Wolfgang Resch³⁷, Jingya Wang³⁸, Leihong Wu³⁹, Gokhan Yavas³⁹, Corey Miles³⁹, Baitang Ning³⁹, Weida Tong³⁹, Christopher E Mason⁴⁰, Eric Donaldson⁴¹, Samir Lababidi⁴², Louis M Staudt⁴³, Zivana Tezak⁴⁴, Huixiao Hong³⁹, Charles Wang⁴⁵, Leming Shi⁴⁶

Affiliations

¹ The Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, MD, USA. wenming.xiao@fda.hhs.gov.
² State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China.
³ Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA.
⁴ Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc., Belmont, CA, USA.
⁵ Advanced Biomedical and Computational Sciences, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA.
⁶ SAS Institute Inc., Cary, NC, USA.
⁷ Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA.
⁸ Illumina Inc., Foster City, CA, USA.
⁹ ATCC, Manassas, VA, USA.
¹⁰ Departments of Medicine and Pathology, University of Toledo Medical Center, Toledo, OH, USA.
¹¹ Genentech, South San Francisco, CA, USA.
¹² Biomarker Development, Novartis Institutes for Biomedical Research, Basel, Switzerland.
¹³ Computational Genomics, Genomics Research Center, AbbVie, North Chicago, IL, USA.
¹⁴ Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland.
¹⁵ European Infrastructure for Translational Medicine, Amsterdam, the Netherlands.
¹⁶ Immuneering Corporation, Cambridge, MA, USA.
¹⁷ The Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA.
¹⁸ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
¹⁹ Seven Bridges Genomics Inc., Cambridge, MA, USA.
²⁰ Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, Uppsala, Sweden.
²¹ Centro di Riferimento Oncologico di Aviano IRCCS, National Cancer Institute, Unit of Oncogenetics and Functional Oncogenomics, Aviano, Italy.
²² IMTM, Faculty of Medicine and Dentistry, Palacky University Olomouc, Olomouc, Czech Republic.
²³ Perron Institute for Neurological and Translational Science, Nedlands, Perth, Western Australia, Australia.
²⁴ Centre for Molecular Medicine and Innovative Therapeutics, Murdoch University, Murdoch, Perth, Western Australia, Australia.
²⁵ Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia.
²⁶ Garvan Institute of Medical Research, The Kinghorn Cancer Centre, Darlinghurst, New South Wales, Australia.
²⁷ National Institute of Metrology, Beijing, China.
²⁸ Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA.
²⁹ CCR Collaborative Bioinformatics Resource, Office of Science and Technology Resources, Center for Cancer Research, Bethesda, MD, USA.
³⁰ Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology, National Cancer Institute, Rockville, MD, USA.
³¹ Digicon, McLean, VA, USA.
³² Department of Biological Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA.
³³ Q2 Solutions-EA Genomics, Morrisville, NC, USA.
³⁴ Integrative Bioinformatics, National Institute of Environmental Health Sciences, Durham, NC, USA.
³⁵ Bioinformatics and Computational Biology Core, National Heart Lung and Blood Institute, National Institutes of Health, Bethesda, MD, USA.
³⁶ Sentieon Inc., Mountain View, CA, USA.
³⁷ Center for Information Technology, National Institutes of Health, Bethesda, MD, USA.
³⁸ AstraZeneca, Gaithersburg, MD, USA.
³⁹ National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA.
⁴⁰ Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA.
⁴¹ The Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA.
⁴² Office of the Chief Scientist, Office of the Commissioner, US Food and Drug Information, Silver Spring, MD, USA.
⁴³ Lymphoid Malignancies Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
⁴⁴ The Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, MD, USA.
⁴⁵ Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA. oxwang@gmail.com.
⁴⁶ State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China. lemingshi@fudan.edu.cn.

^# Contributed equally.

PMID: 34504346
PMCID: PMC8506910
DOI: 10.1038/s41587-021-00994-5

Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing

Wenming Xiao et al. Nat Biotechnol. 2021 Sep.

. 2021 Sep;39(9):1141-1150.

doi: 10.1038/s41587-021-00994-5. Epub 2021 Sep 9.

Authors

Affiliations

¹ The Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, MD, USA. wenming.xiao@fda.hhs.gov.
² State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China.
³ Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA.
⁴ Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc., Belmont, CA, USA.
⁵ Advanced Biomedical and Computational Sciences, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA.
⁶ SAS Institute Inc., Cary, NC, USA.
⁷ Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA.
⁸ Illumina Inc., Foster City, CA, USA.
⁹ ATCC, Manassas, VA, USA.
¹⁰ Departments of Medicine and Pathology, University of Toledo Medical Center, Toledo, OH, USA.
¹¹ Genentech, South San Francisco, CA, USA.
¹² Biomarker Development, Novartis Institutes for Biomedical Research, Basel, Switzerland.
¹³ Computational Genomics, Genomics Research Center, AbbVie, North Chicago, IL, USA.
¹⁴ Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland.
¹⁵ European Infrastructure for Translational Medicine, Amsterdam, the Netherlands.
¹⁶ Immuneering Corporation, Cambridge, MA, USA.
¹⁷ The Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA.
¹⁸ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
¹⁹ Seven Bridges Genomics Inc., Cambridge, MA, USA.
²⁰ Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, Uppsala, Sweden.
²¹ Centro di Riferimento Oncologico di Aviano IRCCS, National Cancer Institute, Unit of Oncogenetics and Functional Oncogenomics, Aviano, Italy.
²² IMTM, Faculty of Medicine and Dentistry, Palacky University Olomouc, Olomouc, Czech Republic.
²³ Perron Institute for Neurological and Translational Science, Nedlands, Perth, Western Australia, Australia.
²⁴ Centre for Molecular Medicine and Innovative Therapeutics, Murdoch University, Murdoch, Perth, Western Australia, Australia.
²⁵ Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia.
²⁶ Garvan Institute of Medical Research, The Kinghorn Cancer Centre, Darlinghurst, New South Wales, Australia.
²⁷ National Institute of Metrology, Beijing, China.
²⁸ Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA.
²⁹ CCR Collaborative Bioinformatics Resource, Office of Science and Technology Resources, Center for Cancer Research, Bethesda, MD, USA.
³⁰ Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology, National Cancer Institute, Rockville, MD, USA.
³¹ Digicon, McLean, VA, USA.
³² Department of Biological Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA.
³³ Q2 Solutions-EA Genomics, Morrisville, NC, USA.
³⁴ Integrative Bioinformatics, National Institute of Environmental Health Sciences, Durham, NC, USA.
³⁵ Bioinformatics and Computational Biology Core, National Heart Lung and Blood Institute, National Institutes of Health, Bethesda, MD, USA.
³⁶ Sentieon Inc., Mountain View, CA, USA.
³⁷ Center for Information Technology, National Institutes of Health, Bethesda, MD, USA.
³⁸ AstraZeneca, Gaithersburg, MD, USA.
³⁹ National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA.
⁴⁰ Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA.
⁴¹ The Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA.
⁴² Office of the Chief Scientist, Office of the Commissioner, US Food and Drug Information, Silver Spring, MD, USA.
⁴³ Lymphoid Malignancies Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
⁴⁴ The Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, MD, USA.
⁴⁵ Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA. oxwang@gmail.com.
⁴⁶ State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China. lemingshi@fudan.edu.cn.

^# Contributed equally.

PMID: 34504346
PMCID: PMC8506910
DOI: 10.1038/s41587-021-00994-5

Abstract

Clinical applications of precision oncology require accurate tests that can distinguish true cancer-specific mutations from errors introduced at each step of next-generation sequencing (NGS). To date, no bulk sequencing study has addressed the effects of cross-site reproducibility, nor the biological, technical and computational factors that influence variant identification. Here we report a systematic interrogation of somatic mutations in paired tumor-normal cell lines to identify factors affecting detection reproducibility and accuracy at six different centers. Using whole-genome sequencing (WGS) and whole-exome sequencing (WES), we evaluated the reproducibility of different sample types with varying input amount and tumor purity, and multiple library construction protocols, followed by processing with nine bioinformatics pipelines. We found that read coverage and callers affected both WGS and WES reproducibility, but WES performance was influenced by insert fragment size, genomic copy content and the global imbalance score (GIV; G > T/C > A). Finally, taking into account library preparation protocol, tumor content, read coverage and bioinformatics processes concomitantly, we recommend actionable practices to improve the reproducibility and accuracy of NGS experiments for cancer mutation detection.

PubMed Disclaimer

Conflict of interest statement

Competing interests

L.F. was an employees of Roche Sequencing Solutions Inc. L.K., K.L. and M.M. are employees of ATCC, which provided cell lines and derivative materials. E.J., O.D.A., T.T., A.M., A.N., A.G. and G.P.S. are employees of Illumina Inc. V.P. and M.S. are employees of Novartis Institutes for Biomedical Research. T.H., E.P and R. Kalamegham are employees of Genentech (a member of the Roche group). Z.L. is an employee of Sentieon Inc. R.K. is an employee of Immuneering Corp. C.E.M. is a cofounder of Onegevity Health. All other authors claim no competing interests.

Figures

**Extended Data Fig. 1 |. Study design to capture “wet lab” factors affecting sequencing quality.**
DNA was extracted from either fresh cells or FFPE processed cells (formalin fixation time of 1, 2, 6, or 24 hours). Both fresh DNA and FFPE DNA were profiled on WGS and WES platforms. For fresh DNA, six centers (Fudan University (FD), Illumina (IL), Novartis (NV), European Infrastructure for Translational Medicine (EA), National Cancer Institute (NC), and Loma Linda University (LL)) performed WGS and WES in parallel following manufacturer recommended protocols with limited deviation. Three of the six sequencing centers (FD, IL, and NV) generated library preparation in triplicate. For FFPE samples, each fixation time point had six blocks that were sequenced at two different centers (IL and GeneWiz (GZ)). Three library preparation protocols (TruSeq PCR-free, TruSeq-Nano, and Nextera Flex) were used with four different quantities of DNA input (1, 10, 100, and 250 ng) and sequenced by IL and LL. DNAs from HCC1395 and HCC1395BL were pooled at various ratios to create mixtures of 75%, 50%, 20%, 10%, and 5%. All libraries from these experiments were sequenced in triplicate on the HiSeq series by Genentech (GT). In addition, nine libraries using the TruSeq PCR-free preparation were run on a NovaSeq for WGS analysis by IL. Sample naming convention (example: WGS_FD_N_1): First field was used for sequencing study: Whole genome sequencing (WGS), Whole exome sequencing (WES), WGS on FFPE sample (FFG), WES on FFPE sample (FFX), WGS on library preparation protocol (LBP), WGS on tumor purity (SPP); Second field was used for sequencing centers, EA, FD, IL, LL, NC, NV, GT, and GZ or sequencing technologies, HiSeq (HS) and NovaSeq (NS); Third field was used for tumor (T) or normal (N); The last field was used for the number of repeats. *WGS performed only on Mixture (tumor purity) samples. ^** WGS and WES performed only on FFPE samples.

**Extended Data Fig. 2 |. Read mapping quality statistics.**
**(a)** Percentage of reads mapped to target regions (SureSelect V6 + UTR) and G/C content for WES runs on fresh or FFPE DNA. **(b)** Read quality from three WGS library preparation kits (TruSeq PCRfree, TruSeq-Nano, and Nextera Flex) on fresh or FFPE DNA. **(c)** Distribution of GIV scores in WGS and WES runs. For detailed statistics regarding the boxplot, please refer to Supplementary Table 5.

**Extended Data Fig. 3 |. Overall read quality distribution for all WES and WGS runs.**
**(a)** Median insert fragment size of WES and WGS run on fresh and FFPE DNA. **(b)** G/C read content for Wes and WGS runs. **(c)** Overall read redundancy for WES and WGS runs. Some outliers were observed in WGS on fresh DNA, which were from runs of TruSeq-Nano with 1 ng of DNA input. **(d)** Overall percentage of reads mapped to target regions for WES runs for fresh and FFPE DNA. For detailed statistics regarding the boxplot, please refer to Supplementary Table 6.

**Extended Data Fig. 4 |. Mutation calling repeatability and O_Score distribution.**
**(a)** Distribution of O_Score of three callers (MuTect2, Strelka2, and SomaticSniper) for twelve WGS and WES runs on BWA alignments. For detailed statistics regarding the boxplot, please refer to Supplementary Table 7. **(b)** “Tornado” plot of reproducibility between twelve WGS runs on the HiSeq series (2500, 4000, and X10) and nine WGS runs on the NovaSeq (S6000). SNVs/indels were called by Strelka2 on BWA alignments.

**Extended Data Fig. 5 |. Source of variance in reproducibility measured by O_Score.**
Actual by predicted plot of WGS **(a)** and WES **(b)**. A total of 8 variables (WGS) or 13 variables (WES), including 2-degree interactions, were included in the fixed effect linear model. 36 samples were used to derive statistics for both WES and WGS. The central blue line is the mean. The shaded region represents the 95% confidence interval.

**Extended Data Fig. 6 |. Effect of post alignment processing on precision and recall of WES and WGS run on FFPE DNA.**
**(a)** precision and recall of mutation calls by Strelka2 on BWA alignments. A single library of FFPE DNA (FFX) and three libraries of fresh DNA (EA_1, FD_1, and NV_1) were run on a WES platform. Resulting reads were either processed by the BFC tool or by Trimmomatic. processed FASTQ files were then aligned by BWA and called by Strelka2. precision and recall were derived by matching calling results with the truth set. (b) precision and recall of mutation calls by three callers, Mutect2 (blue), Strelka2 (green), and SomaticSniper (red), on BWA alignments without or with GATK post alignment process (indel realignment & BQSR).

**Extended Data Fig. 7 |. Jaccard index scores to measure reproducibility of SNVs called by three callers.**
Box plot of Jaccard scores of inter-center, intra-center, and overall pair of SNV call sets from two WGS or WES runs. SNVs were divided into three groups; Repeatable: SNVs defined in the truth set of the reference call set; Gray zone: SNVs not defined as “truth” in the reference call set; Non-Repeatable: SNVs were not in the reference call set. For detailed statistics regarding the boxplot, please refer to Supplementary Table 8.

**Extended Data Fig. 8 |. Sources of variation in Jaccard index.**
**(a)** Summary of factor effects. Twenty-five factors, including five original factors, ten 2-way interactions, and ten 3-way interactions were evaluated in the model. Both P values (derived from F-test) and their LogWorth (−log10 (P value)) are included in the summary plot. The factors are ordered by their LogWorth values. **(b)** Least square means of caller*pair_group*platform interaction. The height of the markers represents the adjusted least square means, and the bars represent confidence intervals of the means. **(c)** Least square means SNV_subset*pair_group*platform interaction. The height of the markers represents the adjusted least square means, and the bars represent confidence intervals of the means. 3168 samples were used to derive these statistics. **(d)** Student’s t-test for platform*pair_group interaction with SNV calls from three callers, MuTect2, Strelka2, and SomaticSniper. The left two panels compare Jaccard indices between intra-center and inter-center for WGS and WES, respectively. The right two panels compare Jaccard indices between WGS and WES for inter-center and intra-center pairs, respectively. Prob > |t| is the two-tailed test P value, and Prob > t is the one-tailed test P value.

**Extended Data Fig. 9 |. WGS vs. WES platform-specific mutations and allele frequency calling accuracy.**
Cumulative VAF plot of precision **(a)**, recall **(b)**, and F-Score **(c)** for three callers (MuTect2, Strelka2, and SomaticSniper) on WES and WGS runs.

**Extended Data Fig. 10 |. Mutation allele frequency and coverage depth in WES and WGS sample.**
Scatter plot of allele frequency and coverage depth by three callers, MuTect2, Strelka2, and SomaticSniper in one example WES sample **(a)** or WGS sample **(b). (c)** Boxplot of read depth on called mutations in WES or WGS. For detailed statistics regarding the boxplot, please refer to Supplementary table 9.

**Fig. 1 |. Study design and read quality.**
a, Study design used to capture nonanalytical and analytical factors affecting cancer mutation detection. DNA was extracted from either fresh cells or FFPE-processed cells and fragmented at three intended sizes. Libraries with various levels of DNA input (either from random shotgun or exome capture) were generated with three different library preparation kits and run on WGS and WES in parallel following recommended protocols (Methods). Twelve replicates were performed at six sequencing centers: three centers (FD, IL and NV) prepared WGS and WES libraries in triplicate; three centers (EA, LL and NC) prepared a single WGS and WES library (3×3 + 3); and 144 libraries were sequenced on either a HiSeq or NovaSeq instrument. Two prealignments (BFC and Trimmomatic), three alignments (BWA, Bowtie and NovoAlign) and two postalignments (GATK and no-GATK) were evaluated. A total of 1,015 mutation call sets were generated. Numbers in parentheses represent possible combinations at that level. Further details on the experiment design are given in Extended Data Fig. 1. b, Read yields (blue), mapping statistics (red) and genome coverage (yellow line) from 12 repeated WGS runs. c, GIV of G > T/C > A and T > /A > C mutation pairs in WES and WGS runs. Six centers used a range of time spans (80–300 s) for DNA shearing. As a result, average insert DNA fragment size ranged from 161 to 274 bp. d, Distribution of GIV score for FFPE DNA with four different fixation times (1, 2, 6 and 24 h) analyzed with WES or WGS: FFX and FFPE on WES platform and FFG and FFPE on WGS platform. Box-and-whisker plots shows the first and third quartiles as well as median values. The upper and lower whiskers extend from the hinge to the largest or smallest value no further than 1.5 × interquartile range from the hinge. For detailed statistics regarding minima, maxima, center, bounds of box and whiskers, and percentiles related to this figure, please refer to Supplementary Table 4.

**Fig. 2 |. Mutation calling reproducibility.**
a, In mutation-calling reproducibility, SNV overlaps across 108 VCF results from 12 repeated WGS and WES runs analyzed with three aligners (BWA, Bowtie2 and NovoAlign) and three callers (MuTect2, Strelka2 and SomaticSniper). b, SNV overlaps across 36 VCF results from 12 repeated WGS and WES runs as analyzed by MuTect2, Strelka2 and SomaticSniper from only BWA alignments. c, SNV overlaps in each of the 12 repeated WES and WGS runs, analyzed by Strelka2 from BWA alignments. The y axis shows that the probability of SNVs that were missed is equal to, or less than, the percentage of the 12 call sets depicted on the x axis. d, Effect summary of the model for WES or WGS. Effect tests were performed to evaluate the importance of each independent variable in a fixed-effect linear model fitted for WES or WGS. F-statistics and corresponding P values were calculated for variables. Both P values and LogWorth (−log₁₀ (P values)) are plotted. The lower-order effects are identified with a caret. The sample size used to derive statistics was 36 for both WES and WGS.

**Fig. 3 |. Nonanalytical factors affecting mutation calling.**
a, Caller performance on three library preparation protocols with different DNA Inputs: 1, 10, 100, 250 and 1,000 ng. WGS sequencing on TruSeq and TruSeq-Nano libraries was performed at LL, while WGS sequencing on Nextera libraries was performed at IL. All sequencing experiments were performed with HiSeq 4000 and analyzed using three aligners (BWA, Bowtie2 and NovoAlign) and three callers (MuTect2, Strelka2 and SomaticSniper). b, Performance of MuTect2, Strelka2 and SomaticSniper on WGS with fresh DNA or FFPE DNA (24 h) from a BWA alignment.

**Fig. 4 |. Bioinformatics for enhanced calling.**
a, Distribution of mutation types called with Strelka2 on BWA alignments of four WES runs preprocessed by Trimmomatic or BFC. WES run on FFPE DNA (FFPE) or fresh DNA (EA_1, FD_1 and NV_1). Numbers of SNVs called from each process are shown at the top. Mutations shared across BFC datasets (overlap.BFC) and Trimmomatic datasets (overlap.trimm) are shown on the left. C > A and T > C artifacts were observed in the Trimmomatic and BFC datasets, respectively; both artifacts were minimized with repeats. b, Performance of mutation calling by Strelka2 on three alignments (Bowtie2, BWA and NovoAlign). Insert is a violin plot of mapping quality (MAPQ) scores from three alignments for an example WGS run. In total, 81 billion, 118 billion and 140 billion data points were used in violin plots for Bowtie2, BWA and NovoAlign, respectively. c, Effect of postalignment processing (indel realignment + BQSR) on mutation calling by MuTect2, Strelka2 and SomaticSniper). d, Effect of tumor purity (20 versus 50%) on five callers (Lancet, MuTect2, Strelka2, TNscope and SomaticSniper) with read coverage of 10×, 30×, 50×, 80×, 100×, 200× and 300×.

**Fig. 5 |. Biological repeats versus analytical repeats.**
Precision (a) and recall (b) of overlapping SNVs/indels that were supported by biological repeats (library repeats) or analytical repeats (two different callers). Each row or column represents calling results from a WES or WGS run called by one of the three callers from a BWA alignment. All 12 repeats of WES and WGS from six sequencing centers (FD, IL, NV, EA, LL and NC) were included.

See this image and copyright information in PMC

References

1. Glasziou P, Meats E, Heneghan C & Shepperd S What is missing from descriptions of treatment in trials and reviews? Brit. Med. J 336, 1472–1474 (2008). - PMC - PubMed
1. Vasilevsky NA et al. On the reproducibility of science: unique identification of research resources in the biomedical literature. PeerJ 1, e148 (2013). - PMC - PubMed
1. Begley CG & Ellis LM Drug development: raise standards for preclinical cancer research. Nature 483, 531–533 (2012). - PubMed
1. Alioto TS et al. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat. Commun 6, 10001 (2015). - PMC - PubMed
1. Griffith M et al. Genome Modeling System: a knowledge management platform for genomics. PLoS Comput. Biol 11, e1004274 (2015). - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
- The YODA Project

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing

Affiliations

Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical