Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2026 Jan:164:105985.
doi: 10.1016/j.yrtph.2025.105985. Epub 2025 Nov 8.

Application of error-corrected sequencing technologies for in vivo regulatory mutagenicity assessment

Affiliations
Review

Application of error-corrected sequencing technologies for in vivo regulatory mutagenicity assessment

Carole L Yauk et al. Regul Toxicol Pharmacol. 2026 Jan.

Abstract

Error-corrected sequencing (ECS) is a transformative method for in vivo mutagenicity assessment, enabling direct, highly sensitive measurement of mutation frequency and spectrum. ECS addresses key limitations of the transgenic rodent (TGR) assay, including lack of integration into standard toxicity studies, restricted model availability, and limited alignment with the 3R principles. To support regulatory acceptance, an expert workgroup of the International Workshops on Genotoxicity Testing (IWGT) reviewed ECS technologies and developed consensus recommendations for its inclusion into Organisation for Economic Co-operation and Development (OECD) test guidelines. The working group agreed that ECS: produces results that are concordant with validated TGR assays; can be incorporated into standard ≥28-day repeat-dose toxicity studies; and, data interpretation should be based on overall mutation frequency compared with concurrent vehicle controls. The working group emphasized harmonized data reporting aligned with OECD principles and endorsed study designs that enable quantitative risk assessment. Overall, the working group agreed that ECS offers a significant advancement over current mutagenicity assays by enabling the use of diverse models beyond conventional TGR systems described in OECD test guideline 488. The working group fully supports the application of ECS to generate in vivo mutagenicity data for regulatory submissions and recommends its inclusion in future OECD test guidelines.

Keywords: Duplex sequencing; Hawk-seq; HiFi-seq; Jade-seq; Mutation; PECC sequencing; SMM-Seq.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest Jesse J. Salk is a founder, former employee and minority equity-holder of TwinStrand Biosciences Inc. He is a named author on Duplex Sequencing-related patents owned by TwinStrand. He is a named author on Duplex Sequencing patents owned by the University of Washington and licensed to TwinStrand, for which he receives royalties. Devon Fitzgerald is a former employee and equity holder of TwinStrand. She is a named inventor on a pending patent related to Duplex Sequencing for which she is not expected to gain financial benefits. Jake Higgins is equity holder and former employee of TwinStrand. Shoji Matsumura is an employee of Kao Corporation that has applied for the patent for Hawk-Seq™ and Jade-Seq™. Naveed Honarvar is an Editorial Board member of this journal. All other authors declare no conflict of interest.

Figures

Figure 1 –
Figure 1 –. Methods for error-corrected Next Generation Sequencing.
(A) Duplex Sequencing (DS). Adapters containing DS tags are ligated onto the ends of double-stranded DNA fragments to uniquely label each strand of the original molecule (i -ii) such that both strands can be tracked throughout amplification and sequencing (iii - iv). Tagged source DNA molecules (ii) are amplified by PCR and the resulting sequencing reads are grouped by unique tag and strand (iii). Top and bottom strands are compared to eliminate errors generated during PCR and sequencing, resulting in a duplex consensus sequence (iv). (B) Hypothesis alignment with weak overlap sequencing (Hawk-Seq). (i) The protocol involves preparing sequencing libraries using a standard Illumina PCR-based method with minor modifications to PCR conditions. After paired-end sequencing, read pairs originating from the same double-stranded DNA (dsDNA) fragment are identified based on their mapping positions on the reference genome. (ii) Consensus sequences are generated if at least one read pair from each strand of the dsDNA fragment is obtained, and variants are detected from these consensus sequences. Hawk-Seq does not require external molecular barcodes. (C) Justifies Analyte Dna sEquence (Jade-Seq). (i) Sequencing errors can be caused by end-repair process during library preparation. (ii) The method removes these errors by digesting single stranded regions in DNA fragments using S1 Nuclease. (D) Paired-end complimentary consensus sequencing (PECC-Seq). Mutations are identified by matching variants in forward and reverse reads of DNA fragments sequenced on an Illumina platform. (i) Genomic DNA is sheared into ~150-bp double-stranded fragments, and a PCR-free library is prepared. (ii) After paired-end sequencing, data is aligned to a reference genome, and forward and reverse strands are identified by mapping coordinates. (iii) Each DNA fragment generates four reads (two paired reads per strand). Variants are classified as mutations if present in all four reads, ensuring high-confidence detection. (E) Single-Molecule Mutation Sequencing (SMM-seq). After DNA fragmentation with specific nucleases (i), the method employs hairpin adapters (ii) and rolling circle amplification (RCA) (iii) to linearly amplifies both strands of each DNA duplex while preserving the physical linkage between the original strands and their copies (iv). True variants are identified by consensus sequences (v). This approach enhances analytical efficiency by minimizing the risk of strand separation during amplification. (F) HiFi sequencing. The methodology is built on Single Molecule Real-Time (SMRT) technology. (i) High-molecular-weight genomic DNA is randomly sheared into 5–10 kbp double-stranded fragments, and SMRTbell adapters are ligated to both ends, converting the fragments into circularized single-stranded templates. A sequencing primer is annealed to the circularized template, and a DNA polymerase is loaded onto the DNA/primer complex. The polymerase extends the primer along the circular template in the presence of fluorescently labeled nucleotide triphosphates. (ii) Forward and reverse consensus sequences are generated, aligned to the reference genome and variant identified when present in both consensus sequences.
Figure 1 –
Figure 1 –. Methods for error-corrected Next Generation Sequencing.
(A) Duplex Sequencing (DS). Adapters containing DS tags are ligated onto the ends of double-stranded DNA fragments to uniquely label each strand of the original molecule (i -ii) such that both strands can be tracked throughout amplification and sequencing (iii - iv). Tagged source DNA molecules (ii) are amplified by PCR and the resulting sequencing reads are grouped by unique tag and strand (iii). Top and bottom strands are compared to eliminate errors generated during PCR and sequencing, resulting in a duplex consensus sequence (iv). (B) Hypothesis alignment with weak overlap sequencing (Hawk-Seq). (i) The protocol involves preparing sequencing libraries using a standard Illumina PCR-based method with minor modifications to PCR conditions. After paired-end sequencing, read pairs originating from the same double-stranded DNA (dsDNA) fragment are identified based on their mapping positions on the reference genome. (ii) Consensus sequences are generated if at least one read pair from each strand of the dsDNA fragment is obtained, and variants are detected from these consensus sequences. Hawk-Seq does not require external molecular barcodes. (C) Justifies Analyte Dna sEquence (Jade-Seq). (i) Sequencing errors can be caused by end-repair process during library preparation. (ii) The method removes these errors by digesting single stranded regions in DNA fragments using S1 Nuclease. (D) Paired-end complimentary consensus sequencing (PECC-Seq). Mutations are identified by matching variants in forward and reverse reads of DNA fragments sequenced on an Illumina platform. (i) Genomic DNA is sheared into ~150-bp double-stranded fragments, and a PCR-free library is prepared. (ii) After paired-end sequencing, data is aligned to a reference genome, and forward and reverse strands are identified by mapping coordinates. (iii) Each DNA fragment generates four reads (two paired reads per strand). Variants are classified as mutations if present in all four reads, ensuring high-confidence detection. (E) Single-Molecule Mutation Sequencing (SMM-seq). After DNA fragmentation with specific nucleases (i), the method employs hairpin adapters (ii) and rolling circle amplification (RCA) (iii) to linearly amplifies both strands of each DNA duplex while preserving the physical linkage between the original strands and their copies (iv). True variants are identified by consensus sequences (v). This approach enhances analytical efficiency by minimizing the risk of strand separation during amplification. (F) HiFi sequencing. The methodology is built on Single Molecule Real-Time (SMRT) technology. (i) High-molecular-weight genomic DNA is randomly sheared into 5–10 kbp double-stranded fragments, and SMRTbell adapters are ligated to both ends, converting the fragments into circularized single-stranded templates. A sequencing primer is annealed to the circularized template, and a DNA polymerase is loaded onto the DNA/primer complex. The polymerase extends the primer along the circular template in the presence of fluorescently labeled nucleotide triphosphates. (ii) Forward and reverse consensus sequences are generated, aligned to the reference genome and variant identified when present in both consensus sequences.

References

    1. Abascal F, et al. , 2021. Somatic mutation landscapes at single-molecule resolution. Nature. 593, 405–410. - PubMed
    1. Armijo AL, et al. , 2023. Molecular origins of mutational spectra produced by the environmental carcinogen N-nitrosodimethylamine and S(N)1 chemotherapeutic agents. NAR Cancer. 5, zcad015. - PMC - PubMed
    1. Ashford AL, et al. , 2025. Alignment between Duplex Sequencing and transgenic rodent mutation assay data in the assessment of in vivo NDMA-induced mutagenesis. Arch Toxicol. 99, 4227–4242. - PMC - PubMed
    1. Bae JH, et al. , 2023. Single duplex DNA sequencing with CODEC detects mutations with high sensitivity. Nat Genet. 55, 871–879. - PMC - PubMed
    1. Beal MA, et al. , 2015. Characterizing Benzo[a]pyrene-induced lacZ mutation spectrum in transgenic mice using next-generation sequencing. BMC Genomics. 16, 812. - PMC - PubMed

LinkOut - more resources