Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Aug 14:2025.08.11.669689.
doi: 10.1101/2025.08.11.669689.

Paired plus-minus sequencing is an ultra-high throughput and accurate method for dual strand sequencing of DNA molecules

Affiliations

Paired plus-minus sequencing is an ultra-high throughput and accurate method for dual strand sequencing of DNA molecules

Alexandre Pellan Cheng et al. bioRxiv. .

Abstract

Distinguishing real biological variation in the form of single-nucleotide variants (SNVs) from errors is a major challenge for genome sequencing technologies. This is particularly true in settings where SNVs are at low frequency such as cancer detection through liquid biopsy, or human somatic mosaicism. State-of-the-art molecular denoising approaches for DNA sequencing rely on duplex sequencing, where both strands of a single DNA molecule are sequenced to discern true variants from errors arising from single stranded DNA damage. However, such duplex approaches typically require massive over-sequencing to overcome low capture rates of duplex molecules. To address these challenges, we introduce paired plus-minus sequencing (ppmSeq) technology, in which both DNA strands are partitioned and clonally amplified on sequencing beads through emulsion PCR. In this reaction, both strands of a double-stranded DNA molecule contribute to a single sequencing read, allowing for a duplex yield that scales linearly with sequencing coverage across a wide range of inputs (1.8-98 ng). We benchmarked ppmSeq against current duplex sequencing technologies, demonstrating superior duplex recovery with ppmSeq, with a rate of 44%±5.5% (compared to ~5-11% for leading duplex technologies). Using both genomic as well as cell-free DNA, we established error rates for ppmSeq, which had residual SNV detection error rates as low as 7.98x10-8 for gDNA (using an end-repair protocol with dideoxy nucleotides) and 3.5x10-7±7.5x10-8 for cell-free DNA. To test the capabilities of ppmSeq for error-corrected whole-genome sequencing (WGS) for clinical application, we assessed circulating tumor DNA (ctDNA) detection for disease monitoring in cancer patients. We demonstrated that ppmSeq enables powerful tumor-informed ctDNA detection at concentrations of 10-5 across most cancers, and up to 10-7 in cancers with high mutation burden. We then leveraged genome-wide trinucleotide mutation patterns characteristic of urothelial (APOBEC3-related and platinum exposure-related signatures) and lung (tobacco-exposure-related signatures) cancers to perform tumor-naive ctDNA detection, showing that ppmSeq can identify a disease-specific signal in plasma cell-free DNA without a matched tumor, and that this signal correlates with imaging-based disease metrics. Altogether, ppmSeq provides an error-corrected, cost-efficient and scalable approach for high-fidelity WGS that can be harnessed for challenging clinical applications and emerging frontiers in human somatic genetics where high accuracy is required for mutation identification.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Paired plus-minus sequencing (ppmSeq) enables high fidelity duplex sequencing and linearly scalable double-stranded (ds)DNA sequencing recovery.
A. ppmSeq workflow. Genomic DNA (gDNA; sheared) or cell-free DNA (cfDNA; unsheared) undergoes end-repair and adapter ligation. The adapter contains a known sequence of 6 non-complementary base pairs (here, As on each strand). Following library preparation, adapter-ligated molecules undergo clonal amplification through emulsion PCR, creating distinguishable Watson and Crick strands via the different adapter sequences. Scenario 1: Equal representation of Watson & Crick strands. During adapter sequencing, the mismatched sequences allow positive identification of dsDNA (termed mixed) reads. Damaged bases (illustrated as squares) can cause polymerases to incorporate the wrong base during emulsion PCR, leading to mismatched sequences during template sequencing. The conflicting sequencing signal at the mismatch position results in the encoding of that base with a low quality. True variants (illustrated as circles) are amplified from both native strands and do not cause conflicting sequencing signal, resulting in a high base quality encoding. Scenario 2: Single-strand dropout. Damaged bases (illustrated as squares) are amplified throughout the single-stranded clones and are indistinguishable from true variants (illustrated as circles). Here, both true variants and false positives are encoded with high base quality scores and cannot be readily distinguished. However, the sequencing signal at the adapter positions will also be consistent (uniform A or T reads), allowing for the identification of these reads and their computational filtering. Bottom: Sequencing signal at the adapter positions for cell-free DNA sample UPN-035. Single-stranded reads can be identified through the consistent A or T sequencing signal, and dsDNA is identified through mixed A:T signal. B. Assessment of total genome coverage [genome equivalents (GE)] across different total input amounts (ng) of cell-free DNA (cfDNA) sequenced using ppmSeq. There is a high correlation between input and coverage. C. Comparison of mixed read yields (representing duplex reads) obtained from ppmSeq libraries (unsheared cell-free DNA or sheared genomic DNA) with reported duplex yields from published unique molecular identifier technologies (NanoSeq and CODEC). Chainterminating dideoxy bases (ddBTPs; used in NanoSeq) are incorporated into nicked dsDNA to prevent amplification and improve duplex error rates for the gDNA ppmSeq libraries. Duplex recovery is assessed by the final number of dsDNA bases sequenced (in Gb) and total dsDNA coverage (in genome equivalents) across total bases sequenced. ppmSeq provides the highest yield of dsDNA recovery compared to available methods for cell-free DNA and gDNA (with ddBTPs). D. Residual SNV rate from ppmSeq libraries (mixed reads only) compared to SNV rate from published duplex sequencing protocols for sperm gDNA libraries, showing the fraction of duplex reads recovered for each technology. For ppmSeq libraries, the gDNA sample was sequenced to 100x, and variants that occurred in a single read were considered errors. Squares represent samples that were not prepared with ddBTPs during end-repair, thus enabling the amplification of nicked and overhanging DNA. Circles represent samples that were prepared with ddBTPs. E. Residual SNV rates of cell-free DNA libraries (n = 5 for each method, mixed reads only) generated with standard whole-genome sequencing (Illumina NovaSeq 6000 and Ultima UG 100), ppmSeq (Ultima) and duplex (Ultima, Cheng et al. Nature Methods, 2025). SNV rates were measured on the scale of the entire genome, where any single occurring variant is considered an error. Samples were not matched for each method, and sequencing statistics are available in Supplementary Table 1.
Figure 2.
Figure 2.. in silico mixing studies for tumor-informed ctDNA detection demonstrates superior performance of ppmSeq compared to standard whole-genome sequencing on a UG 100 sequencer.
A. in silico mixing workflow. Briefly, patient cfDNA is computationally mixed into unmatched cfDNA from cancer-free controls in known proportions to generate a range of expected tumor fractions (TF). Clinical patient information, cancer types and tumor fractions are available in Supplementary Table 6 and 7. B. Benchmarking tumor fraction detection with ppmSeq using an in-vitro mixture of Genome in a Bottle (GIAB) genomic DNA. Each of the HG001- and HG005-unique homozygous variants were subsampled for a signature of size 30,000 SNVs, for 30 times with repeats. The distribution of GIAB-supporting reads is shown in boxplots, the area under ROC-curve, for each sample against the HG002 sample (expected fraction=0) is shown in heatmaps. Results per replicate are presented in Supplementary Figure 2B. C. in silico mixing study comparing matched libraries that underwent both standard Ultima sequencing (amplified library using KAPA HYPER PREP kit (KK8505)) and Ultima ppmSeq from a cohort consisting of cancer patients (n = 6) with matched plasma, tumor and normal tissue sequencing at 30x (top row) and 300x (bottom row) total coverage. To generate cancer-specific mutation profiles for each patient, tumor and normal samples were sequenced [average of 100x (72-132x) and 80x (64-146x), respectively]. Plasma samples were sequenced to an average of 70x (28-103x) for ppmSeq and 139x (110-166x) for standard Ultima. Patient-specific mutational profiles were computationally mixed into the n = 5 remaining unmatched cfDNA profiles from cross-patients and reads (mixed and non-mixed) matching the mutational profile were tabulated. For each simulated condition, we drew 100,000 samples and computed a z-score as the difference between the matched and background distributions. The average results across controls yielded a calibrated measure of ctDNA detectability, informing the threshold at which true tumor-derived signals can be confidently distinguished from artifacts. We performed this procedure 10 times per condition, and the distribution of these replicates are summarized in the boxplots (n = 50 replicates per boxplot). Area under the receiving operator curve (AUC) values demonstrates detection performance at the different admixed TFs versus negative controls (TF = 0) as measured by z score. Mixed and non-mixed reads were used for this analysis.
Figure 3:
Figure 3:. ppmSeq empowers tumor-naive ctDNA detection.
A. Cohort description for tumor-naïve ctDNA detection comprising 20 patients with urothelial cancer (stages II-IV). 20 cfDNA samples were sequenced with ppmSeq and matched Ultima duplex whole-genome sequencing. Of these 20 patients, 11 had previously received neoadjuvant platinum chemotherapy, and 13 had tumor tissue available for sequencing datasets. B. Comparison of the trinucleotide frequencies obtained from matched duplex whole-genome sequencing with different levels of denoising (duplex: duplex consensus variant calling; unique molecular identifier: strand-agnostic unique molecular identifier (UMI) consensus variant calling; or standard: UMI-agnostic denoising) and ppmSeq (mixed reads only). Trinucleotide frequencies of each variant-calling method were compared to those of the matched tumor mutational pattern using cosine similarities. SBS2 and SBS13 are prominent in urothelial cancers and represent APOBEC3-associated signatures reflecting cytidine deaminase activity, where C>T transitions can occur as a function of uracil generation by cytidine deaminase activity (SBS2) and well as C>G and C>A mutations that can arise as a result of polymerase errors following uracil excision (SBS13). C. Cosine similarities of trinucleotide frequencies in cfDNA compared to those from matched tumor across sequencing strategies (duplex: duplex consensus variant calling; single strand: strand-agnostic unique molecular identifier (UMI) consensus variant calling; UMI-agnostic denoising) and ppmSeq (mixed reads only) for 4 patients with high tumor burden (defined as having a tumor fraction >5% as measured through ichorCNA) and matched tumor sequencing. D. Comparison of tumor-naive APOBEC3 scores generated from cfDNA (de novo ppmSeq; mixed reads only) versus tumor-informed APOBEC3 scores (standard whole-genome sequencing of tumor in the n=13 patients for whom tumor sequencing was available. Tumor-informed ctDNA based tumor fractions were measured by counting variants found in the ctDNA that matched the tumor sequencing profile, and by dividing by the total number of reads overlapping tumor-specific mutations. The tumor-informed APOBEC3 score is obtained by multiplying the tumor fraction by the relative contribution of APOBEC3 mutations (SBS2 + SBS13) measured from the tumor mutational profile. There is a strong correlation between the expected, tumor-informed APOBEC3 contributions in the plasma and the de novo APOBEC3 score. The de novo APOBEC3 score is obtained by fitting the trinucleotide profiles of somatic variants to a curated catalog of cancer-associated SBS signatures (see Methods). E. Plasma mutational scores from APOBEC3 variants (SBS2 + SBS13) in individuals with urothelial cancer (stage II–IV; n = 20) and healthy individuals (n = 15). Plasma signatures were fit to a custom reference of signatures comprising SBS2, SBS13, SBS31, SBS35 (Cosmic v3.3), clonal hematopoiesis and one derived from a subset of cancer-free controls (Methods). For E-I, only mixed reads were used to measure plasma mutational scores. F. Plasma mutational scores from platinum therapy mutagenesis (SBS31) in individuals with urothelial cancer (stage II–IV; n = 11 after, n = 9 before) and healthy individuals (n = 15). Plasma signatures were fit to a custom reference of signatures comprising SBS2, SBS13, SBS31 and SBS35 (Cosmic v3.3), clonal hematopoiesis and one derived from a subset of cancer-free controls (Methods). G. Plasma mutational scores from tobacco variants (SBS4) in individuals with lung cancer (n = 8) and healthy individuals (n = 15). Plasma signatures were fit to a custom reference of signatures comprising SBS4 (Cosmic v3.3), clonal hematopoiesis and one derived from a subset of cancer-free controls (Methods). H. Serial plasma monitoring of patient UPN-028 with ppmSeq and no matched tumor corresponds to changes seen on imaging. Pre-treatment and post-treatment (day 39) CT imaging of a lung cancer patient with decreased ctDNA (tobacco signature obtained from cfDNA sequencing with ppmSeq) in response to immune checkpoint inhibition therapy. I. Serial plasma monitoring of patient UPN-030 with ppmSeq and no matched tumor corresponds to changes seen on imaging. Pre-treatment and post-treatment (day 35) CT imaging of Hillar nodes and day 78 and day 127 CT imaging of lung a lung cancer patient reflects ctDNA levels (tobacco signature obtained from cfDNA sequencing with ppmSeq) in response to immune checkpoint inhibition therapy.

References

    1. Coorens T. H. H. et al. The somatic mutation landscape of normal gastric epithelium. Nature 640, 418–426 (2025). - PMC - PubMed
    1. Yokoyama A. et al. Age-related remodelling of oesophageal epithelia by mutated cancer drivers. Nature 565, 312–317 (2019). - PubMed
    1. Olafsson S. et al. Somatic Evolution in Non-neoplastic IBD-Affected Colon. Cell 182, 672–684.e11 (2020). - PMC - PubMed
    1. Brunner S. F. et al. Somatic mutations and clonal dynamics in healthy and cirrhotic human liver. Nature 574, 538–542 (2019). - PMC - PubMed
    1. Zviran A. et al. Genome-wide cell-free DNA mutational integration enables ultra-sensitive cancer monitoring. Nat Med 26, 1114–1124 (2020). - PMC - PubMed

Publication types

LinkOut - more resources