Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug;66(6-7):311-326.
doi: 10.1002/em.70020. Epub 2025 Jun 30.

Transferability, Reproducibility and Sensitivity of Mutation Quantification by Duplex Sequencing

Affiliations

Transferability, Reproducibility and Sensitivity of Mutation Quantification by Duplex Sequencing

Shaofei Zhang et al. Environ Mol Mutagen. 2025 Aug.

Abstract

Duplex Sequencing (DS) is an ultra-accurate, error-corrected next generation sequencing (ecNGS) technology for mutation analysis. A working group (WG) within Health and Environmental Sciences Institute's Genetic Toxicology Technical Committee is investigating the suitability of ecNGS for regulatory mutagenicity testing, using DS as a model. Initial steps to promote acceptance require demonstrating technical reproducibility across DS-experienced and inexperienced laboratories and establishing the method's sensitivity relative to conventional tests. Thus, the WG conducted a 'reconstruction experiment' to evaluate the transferability, reproducibility, and sensitivity of DS. TwinStrand Biosciences first applied DS to establish mutation frequency (MF) in DNA samples extracted from the livers of an untreated Sprague Dawley rat, or rats treated with either 100 mg/kg/day benzo[a]pyrene (B[a]P) for ten days or 40 mg/kg/day N-ethyl-N-nitrosourea (ENU) for three days. Using the measured MF in these original samples, mixtures were then constructed using the B[a]P- and ENU-treated samples to create "MF standards" with target MFs 1.2-, 1.5-, and 2-fold greater than the untreated control. Aliquots of these standards were distributed to seven laboratories in North America and Europe. DS libraries were prepared by each laboratory and TwinStrand. All eight laboratories met library preparation and assay performance metrics to yield high quality sequencing data with MF in the expected 'MF standard' range. The measured MF and mutation spectra were nearly identical across the laboratories and a 2-fold increase in MF could readily be identified in all labs relative to the untreated controls. The results confirm the high reproducibility and sensitivity of DS for mutagenicity assessment.

Keywords: ecNGS; mutation frequency; mutational spectra; power analysis; reconstruction experiment.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Study site locations and sample descriptions. (A) Locations of eight study sites spanning four countries and two continents. (B) Pure and mixture samples used in the study and previously measured or expected mutation frequencies (MFmin). Expected MFmin values targeted by the DNA mixtures are italicized. Colored boxes at left of the table indicate sample color coding used in subsequent figures. See Methods for details regarding preparation of mixtures.
FIGURE 2
FIGURE 2
Assay performance matrix across eight laboratories. Per‐library informative duplex bases (A), mean duplex depth (B), and peak tag family size (C) are plotted as points, grouped by lab along the x‐axis and color‐coded by sample type. Each plot is divided into sub‐panels for libraries prepared with 500 ng (left) or 1000 ng (right) of input mass.
FIGURE 3
FIGURE 3
High interlaboratory reproducibility demonstrated in DS analyses of shared DNA samples. MFmin was calculated as the unique mutation count divided by the total duplex bases (excluding no‐calls). Error bars represent 95% binomial proportion Wilson score intervals calculated from the mutation frequency data.
FIGURE 4
FIGURE 4
Highly reproducible results across labs for both B[a]P and ENU mixtures as measured by mutation frequencies. (A) MFs for each sample estimated using a generalized linear model. Error bars represent the standard error. (B) Fold change in MFs from control for each mixture or pure sample across labs. Error bars represent the standard error. Asterisks indicate significance at the 0.05 level.
FIGURE 5
FIGURE 5
Group and replicate analysis of mutation frequencies. (A) MFs plotted by sample type, treating libraries prepared by different labs as technical replicates. Each dot represents one library, with horizontal lines representing the group mean. Asterisks indicate FDR‐adjusted p‐values less than 1 × 10−2 (*), 1 × 10−3 (**), and 1 × 10−4 (***). Replicate libraries of the untreated control and 1.2× mixtures were randomly selected to make smaller groups and statistical testing was performed on the sampled groups. Independent pairwise comparisons were performed for untreated and the B[a]P 1.2× mixture (B) and for untreated and the ENU 1.2× mixture (C). For each group size, indicated by the x‐axis, 100 independent samplings were performed and the p‐value for each comparison is plotted on the y‐axis. Points indicate individual iterations of sampling and “violin” shapes represent the density distribution of the resulting p‐values for each group size. Number above each set of points indicates the number of p‐values < 0.05 out of the 100 sampling iterations.
FIGURE 6
FIGURE 6
Unsupervised hierarchical clustering by SBS spectra. For each library, the normalized proportions of simple base substitution types (pyrimidine notation) are plotted as a stacked bar (middle panel). The dendrogram (top) reflects the results of unsupervised hierarchical clustering. The sample identity for each library is indicated along the x‐axis by color‐coded boxes.
FIGURE 7
FIGURE 7
Replicate analysis of subtype mutation frequencies. Similar to the replicate sampling analysis performed for total MF, replicates were randomly selected into smaller groups and pairwise testing was performed to determine if a significant increase in subtype MF could be detected. For each subtype in each sample pairing, 100 iterations of sampling and statistical testing were performed, represented by individual points. Larger shapes show the density distribution of p‐values for each group size and numbers represent the number of iterations that yielded a p‐value < 0.05. For each mutagen, the most dominant mutation subtypes are shown: C>A (A) and C>G (B) mutations for the B[a]P 1.2× mixture and T>A (C) and T>C (D) mutations for the ENU 1.2× mixture.
FIGURE 8
FIGURE 8
Comparison of mutation spectra between treatment groups and across labs. (A) The normalized proportions of simple base substitution (SNV) types (pyrimidine notation) and non‐SNV variant types are plotted as a stacked bar for each library. The treatment for each library is indicated along the x‐axis by color‐coded boxes. Samples are organized by lab, which are denoted at the top of the plot. Asterisks indicate significant differences between a sample and its within‐lab control. Daggers represent comparisons of a treatment group between labs; samples with the same dagger are not significantly different from one another. All comparisons were made at a significance level of 0.05. B. The normalized proportions of SNV subtypes (pyrimidine notation) in their trinucleotide context are plotted in a heatmap for each library. The treatment for each library is indicated along the y‐axis by color‐coded boxes. Samples are faceted by lab (right) and include the total mutation count for all samples per lab. Trinucleotide subtypes are faceted by SNV subtype and include the total mutation count for each subtype across all samples. Asterisks indicate significant differences between a sample and its within‐lab control. Daggers represent comparisons of a treatment group between labs; samples with the same dagger are not significantly different from one another. All comparisons were made at a significance level of 0.05.
FIGURE 9
FIGURE 9
Power analysis of minimum detectable fold‐change using different sample size. The calculation was based on MF of 6.7 × 10−8 with total number of informative duplex bases of: 1.71 × 109 (Red), 1.34 × 109 (Orange), 1.03 × 109 (Green), 6.84 × 108 (Blue), and 3.42 × 108 (Purple). The gray line represents the minimum detectable fold change of 1.5 fold. Power analysis was conducted assuming no additional binomial variation or three different sample variances (sigma = 0.05, 0.10, and 0.15) with a desired power of 80%.

References

    1. Armijo, A. L. , Thongararm P., Fedeles B. I., et al. 2023. “Molecular Origins of Mutational Spectra Produced by the Environmental Carcinogen N‐Nitrosodimethylamine and S(N)1 Chemotherapeutic Agents.” NAR Cancer 5, no. 2: zcad015. - PMC - PubMed
    1. Beal, M. A. , Meier M. J., LeBlanc D. P., et al. 2020. “Chemically Induced Mutations in a MutaMouse Reporter Gene Inform Mechanisms Underlying Human Cancer Mutational Signatures.” Communications Biology 3, no. 1: 438. - PMC - PubMed
    1. Benjamini, Y. , and Hochberg Y.. 1995. “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society. Series B, Statistical Methodology 57, no. 1: 289–300.
    1. Bercu, J. P. , Zhang S., Sobol Z., Escobar P. A., Van P., and Schuler M.. 2023. “Comparison of the Transgenic Rodent Mutation Assay, Error Corrected Next Generation Duplex Sequencing, and the Alkaline Comet Assay to Detect Dose‐Related Mutations Following Exposure to N‐Nitrosodiethylamine.” Mutation Research, Genetic Toxicology and Environmental Mutagenesis 891: 503685. - PubMed
    1. Besaratinia, A. , Li H., Yoon J. I., Zheng A., Gao H., and Tommasi S.. 2012. “A High‐Throughput Next‐Generation Sequencing‐Based Method for Detecting the Mutational Fingerprint of Carcinogens.” Nucleic Acids Research 40, no. 15: e116. - PMC - PubMed

LinkOut - more resources