Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May;22(5):973-981.
doi: 10.1038/s41592-025-02648-9. Epub 2025 Apr 11.

Error-corrected flow-based sequencing at whole-genome scale and its application to circulating cell-free DNA profiling

Affiliations

Error-corrected flow-based sequencing at whole-genome scale and its application to circulating cell-free DNA profiling

Alexandre Pellan Cheng et al. Nat Methods. 2025 May.

Abstract

Differentiating sequencing errors from true variants is a central genomics challenge, calling for error suppression strategies that balance costs and sensitivity. For example, circulating cell-free DNA (ccfDNA) sequencing for cancer monitoring is limited by sparsity of circulating tumor DNA, abundance of genomic material in samples and preanalytical error rates. Whole-genome sequencing (WGS) can overcome the low abundance of ccfDNA by integrating signals across the mutation landscape, but higher costs limit its wide adoption. Here, we applied deep (~120×) lower-cost WGS (Ultima Genomics) for tumor-informed circulating tumor DNA detection within the part-per-million range. We further leveraged lower-cost sequencing by developing duplex error-corrected WGS of ccfDNA, achieving 7.7 × 10-7 error rates, allowing us to assess disease burden in individuals with melanoma and urothelial cancer without matched tumor sequencing. This error-corrected WGS approach will have broad applicability across genomics, allowing for accurate calling of low-abundance variants at efficient cost and enabling deeper mapping of somatic mosaicism as an emerging central aspect of aging and disease.

PubMed Disclaimer

Conflict of interest statement

Competing interests: A.P.C. and D.A.L. have filed a provisional patent regarding certain aspects of this manuscript. D.A.L. and A.J.W. have also filed two additional patent applications regarding work presented in this manuscript. A.P.C. is listed as an inventor on submitted patents pertaining to cell-free DNA (US patent applications 63/237,367, 63/056,249, 63/015,095 and 16/500,929) and receives consulting fees from Eurofins Viracor and has received conference travel support from Ultima Genomics. I.R. and A.J. are employees and shareholders of Ultima Genomics. D.L. is a shareholder of Ultima Genomics. G.I. has received consulting fees from Daiichi Sankyo. J.D.W. is a consultant for Apricity, Ascentage Pharma, Bicara Therapeutics, Bristol Myers Squibb, Daiichi Sankyo, Dragonfly, Imvaq, Larkspur, Psioxus, Takeda, Tizona, Trishula Therapeutics, Immunocore – Data Safety board and Scancell; reports grant and research support from Bristol Myers Squibb and Enterome; has equity in Apricity, Arsenal IO/Cell Carta, Ascentage, Imvaq, Linneaus, Georgiamune, Takeda, Tizona Pharmaceuticals and Xenimmune; and is an inventor on the following patents: Xenogeneic DNA Vaccines; Newcastle Disease viruses for Cancer Therapy; Myeloid-derived suppressor cell (MDSC) assay; Prediction of Responsiveness to Treatment with Immunomodulatory Therapeutics and Method of Monitoring Abscopal Effects during such Treatment; Anti-PD1 Antibody; Anti-CTLA4 antibodies; Anti-GITR antibodies and methods of use thereof; CD40 binding molecules and uses thereof. A. Saxena receives research funding from AstraZeneca, has served on Advisory Boards for G1 Therapeutics, Boehringer Ingelheim, Novocure, InxMed, Bristol Myers Squibb and Galvanize Therapeutics, and as a consultant for Galvanize Therapeutics. M.A.P. has received consulting fees from Bristol Myers Squibb, Merck, Novartis, Eisai, Pfizer, Lyvgen and Chugai and has received institutional support from RGenix, Merck Infinity, Bristol Myers Squibb, Merck and Novartis. M.K.C. has received consulting fees from Bristol Myers Squibb, Merck, InCyte, Moderna, ImmunoCore and AstraZeneca and receives institutional support from Bristol Myers Squibb. S.T. is funded by Cancer Research UK (grant reference number A29911); the Francis Crick Institute, which receives its core funding from Cancer Research UK (FC10988), the UK Medical Research Council (FC10988) and the Wellcome Trust (FC10988); the National Institute for Health Research Biomedical Research Centre at the Royal Marsden Hospital and Institute of Cancer Research (grant reference number A109), the Royal Marsden Cancer Charity, The Rosetrees Trust (grant reference number A2204), Ventana Medical Systems (grant reference numbers 10467 and 10530), the National Institute of Health (U01 CA247439) and Melanoma Research Alliance (686061). S.T. has received speaking fees from Roche, AstraZeneca, Novartis and Ipsen. S.T. has the following patents filed: Indel mutations as a therapeutic target and predictive biomarker PCTGB2018/051892 and PCTGB2018/051893. G.B. has sponsored research agreements through her institution with Olink Proteomics, Teiko Bio, InterVenn Biosciences and Palleon Pharmaceuticals; served on advisory boards for Iovance, Merck, Nektar Therapeutics, Novartis and Ankyra Therapeutics; consulted for Merck, InterVenn Biosciences and Ankyra Therapeutics and holds equity in Ankyra Therapeutics. B.M.F. is on the advisory boards for Astrin Bioscience, Natera, Guardant, Janssen, Gilead, Merck, Immunomedics and QED Therapeutics, is a consultant for QED Therapeutics, Astra Biosciences and BostonGene and obtains patent royalties from Immunomedics and Gilead, honoraria from Urotoday and Axiom Healthcare Strategies and research support from Eli Lilly. B.M.F. reports support from the NIH, DoD-CDMRP, Starr Cancer Consortium and the P-1000 Consortium. D.A.L. is on the Scientific Advisory Board of Mission Bio, Pangea, Alethiomics and Veracyte, and has received prior research funding support from Illumina, Ultima Genomics, Celgene, 10x Genomics and Oxford Nanopore Technologies. The remaining authors declare no competing interests.

Figures

Extended Data Figure 1:
Extended Data Figure 1:. Ultima and Illumina sequencing datasets of human-mapped reads in mouse PDX datasets (n=3).
A Homopolymer size estimation of bases between two PCR duplicates (all samples combined) in Ultima datasets. B Homopolymer size estimation of bases between a read and the aligned reference (all samples combined) in Ultima datasets. C Homopolymer size estimation of bases between two PCR duplicates (all samples combined) in Illumina datasets. D Homopolymer size estimation of bases between a read and the aligned reference (all samples combined) in Illumina datasets. E Indel calling accuracy by PCR duplicate family sizes in Ultima datasets. F Indel calling accuracy of Illumina sequencing reads (for single family reads). G Frequency of homopolymer sizes across the human genome. For boxplots in (E) and (F), the lower and upper ends of boxes represent the 25th and 75th percentiles of the data, respectively, and the horizontal lines represent the median. The whiskers represent at most 1.5 times the IQR. Accuracy in (E) and (F) is defined as the number of correct homopolymer assignments in individual sequencing reads divided by the occurrences of that homopolymer size in the human genome in all sequenced reads.
Extended Data Figure 2:
Extended Data Figure 2:. Flow-based sequencing provides predictable error-robust motifs.
A Single-nucleotide variant analysis of matched Ultima and Illumina sequencing datasets across 96 trinucleotide contexts. Cycle shift motifs (described in B) are indicated by plus signs. B Left: Example sequencing of a TGC trinucleotide in flowspace. Given a flow order of T>G>C>A, one full flow cycle of each nucleotide should provide a 1>1>1>0 signal. Top, right: Example of how a T[G>A]C alt disrupts the cycles in flow space basecalling. Two sequencing cycles are required to fully resolve a TAC sequencing motif. We refer to these types of motifs as cycle shift motifs. Bottom, right: Example of how a T[G>C]C variant does not affect the cycles of flow space basecalling. C Error rates in Ultima and Illumina sequencing datasets for trinucleotide variants that alter the flowspace sequencing cycle. P-values were measured using a two-sided Wilcoxon test. Error bars in (A) represent the standard error of the mean. For boxplots in (C), the lower and upper ends of boxes represent the 25th and 75th percentiles of the data, respectively, and the horizontal lines represent the median. The whiskers represent at most 1.5 times the IQR.
Extended Data Figure 3:
Extended Data Figure 3:. Tradeoffs between deep-targeted sequencing and modest whole-genome sequencing for ctDNA detection.
A Mutational burden (number of SNVs) of 22 cancer types retrieved from the Pan Cancer Analysis of Whole Genomes consortium. The numbers along the x-axis represent the number of tumors analyzed per cancer type. B Median ctDNA detection opportunities using a whole-genome approach with 10x sequencing coverage, a 10-target panel at 10,000x coverage and a 1-target panel at 10,000x coverage. The pink shaded area represents tumor types for which targeting only a few sites may offer benefit over whole-genome sequencing. The blue shaded area represents tumor types for which a whole-genome approach will offer more opportunities to detect ctDNA over targeted panels. The lower and upper ends of the boxplots in (A) represent the 25th and 75th percentiles of the data, respectively, and the horizontal lines represent the median. The whiskers represent at most 1.5 times the IQR.
Extended Data Figure 4:
Extended Data Figure 4:. Circulating tumor DNA cost and coverage analysis between Illumina and Ultima sequencing in a matched sample.
Areas under the curve (AUCs) are measured by calculating the area under a receiver operating characteristic curve comparing a given group (for example, Illumina 20x at 10−6 expected tumor fraction) to its platform and coverage-matched cancer-free control (for example, Illumina 20x, expected tumor fraction of 0). All AUCs at expected tumor fractions of 10−4 and greater were 1.00. Z-scores of a given sample are calculated against their coverage and platform matched cancer-free control (expected tumor fraction of 0).
Extended Data Figure 5:
Extended Data Figure 5:. Variant allele frequencies for variants across denoising approaches.
Variant allele frequencies (calculated using unfiltered sequencing reads) in positions where a variant was found using UMI-agnostic denoised reads, Single strand corrected reads and in duplex corrected reads. Allele frequencies of 0.2 and below are colored in red.
Extended Data Figure 6:
Extended Data Figure 6:. Comparison of detected UV-derived mutations using duplex, single-strand and UMI-agnostic denoising methods.
A Cosine similarities by cancer stage at baseline timepoints (pre-treatment or pre-surgery) for UV and CH-associated signatures. B Comparison of duplex, single-strand and UMI-agnostic denoising methods to detect melanoma-associated variants using a single-read variant calling pipeline for pre-treatment plasma samples from melanoma patients (top) and cancer-free controls (bottom). P-values were measured using a two-sided Wilcoxon test. For all boxplots, the lower and upper ends of boxes represent the 25th and 75th percentiles of the data, respectively, and the horizontal lines represent the median. The whiskers represent at most 1.5 times the IQR.
Extended Data Figure 7:
Extended Data Figure 7:. Tumor-agnostic copy-number based tumor fraction estimation in stage III and IV melanoma and cancer-free control samples.
Samples include cancer-free controls (n = 10); stage III melanoma (pre-surgery; n = 10) and stage IV melanoma (pre-treatment; n = 4). Dotted line at 0.03 represents the limit of detection of ichorCNA. For boxplots, the lower and upper ends of boxes represent the 25th and 75th percentiles of the data, respectively, and the horizontal lines represent the median.
Extended Data Figure 8:
Extended Data Figure 8:. ctDNA dynamics throughout treatment in melanoma patients.
A Changes in circulating tumor DNA (increase or decrease) relative to the earliest sampled timepoint. Solid lines represent patients with recurrence or progressive disease, and dashed lines represent patients with either partial response or who are recurrence-free following treatment. Closed and open circles represent samples with and without detected ctDNA, respectively. B Difference in ctDNA relative to the pre-treatment timepoint stratified by clinical outcome. One sample did not have a pre-treatment timepoint available (MEL-15; progressive disease) and so a day 9 post-treatment time point was used as baseline. For boxplots in (B), the lower and upper ends of boxes represent the 25th and 75th percentiles of the data, respectively, and the horizontal lines represent the median. The whiskers represent at most 1.5 times the IQR. P-values were calculated using a two-sided Wilcoxon test.
Extended Data Figure 9:
Extended Data Figure 9:. Major signature contributions from urothelial cancer patients’ tumors measured through whole-genome sequencing.
Top: total mutation counts per sequenced tumor. Bottom: signature contributions. Trinucleotide frequencies were fit to the entire COSMIC database (version v.3.3). When a patient had two or more tumors (B01, B04, B15, B16, B17, B18, B19), we identified clonal mutations by their presence in at least two tumors.
Figure 1.
Figure 1.. Ultralow ctDNA detection requires deep sequencing coverage and low error rates.
A Simulation of ctDNA detection given different error rates (columns), whole genome coverages (rows) and tumor fractions (x-axis). (n=1,000 replicates per set of conditions). B Cell-free DNA library preparation pre-analytical workflow. C Sequencing depth of matched Illumina and Ultima datasets. D Normalized read coverage for Illumina (top) and Ultima sequenced (bottom) matched cell-free DNA sample. E Left: Copy number-based tumor fraction estimation measured with Illumina or Ultima sequencing in matched samples using ichorCNA. Matched cancer-free controls were used to create a panel of normals prior to tumor fraction estimation. Right: Single-nucleotide variant-based tumor fraction estimation measured with Illumina or Ultima sequencing. Somatic SNVs were identified through matched tumor-normal sequencing. Two samples without tumor sequencing and with low ctDNA fraction (<5% measured through CNV analysis) were omitted from this analysis. Spearman’s ρ coefficient and corresponding two-sided p-value was calculated using the stats package (v.3.6) function in R (v.3.6). F Circulating tumor DNA cost and coverage analysis between Illumina and Ultima sequencing in a matched sample. Areas under the curve (AUCs) are measured by calculating the area under a receiver operating characteristic curve comparing a given group (for example, Illumina 20x at 10−6 expected tumor fraction) to its platform and coverage-matched cancer-free control (for example, Illumina 20x, expected tumor fraction of 0). (n=20 replicates per set of conditions). All AUCs at expected tumor fractions of 10−4 and greater were 1.00. Z-scores of a given sample are calculated against their coverage and platform matched cancer-free control (expected tumor fraction of 0). For all boxplots, the lower and upper ends of boxes represent the 25th and 75th percentiles of the data, respectively, and the horizontal lines represent the median. The whiskers represent at most 1.5 times the IQR.
Figure 2.
Figure 2.. Duplex correction allows ctDNA identification without tumor sequencing.
A Error rates in WGS on mouse PDX samples (n=3). B Variant allele frequencies (AFs), calculated using unfiltered sequencing reads) in positions where a variant was found using uncorrected reads (left; inset highlights higher AFs by enforcing a y-axis cutoff of 0.002) and in duplex corrected reads (middle). Removing germline reads reveals somatic mutations with a modal AF of 0.21 (right). C Comparison between the modal AF of a patient with progressive disease (samples MEL-12.A-E) in duplex corrected positions (AFs between 5% and 30% only) and copy-number based tumor fraction estimations. D,E Trinucleotide frequencies of a cancer-free plasma sample (CTRL-07, D) and a stage IVB cancer plasma sample (MEL-12.D, tumor fraction of 23%, E) in UMI-agnostic corrected WGS (top row), single-stranded correction (middle row) and duplex correction (bottom row). Cosine similarity with SBS7a/SBS7b (UV damage; Cosmic v3.3) and clonal hematopoiesis (CH ) is compared across conditions. F Plasma mutational scores due to UV damage in patients with melanoma [stage IV (n=4) and stage III (n=8)] and controls (n=10) at baseline timepoint (prior to treatment or surgery). Plasma signatures were fit to a custom reference of signatures comprising SBS7a, SBS7a (Cosmic v3.3) and CH ). G in silico mixing study of metastatic melanoma samples MEL-12.B with control CTRL-06 (40 replicates per tumor fraction, 6.5x coverage per replicate). Tumor scores were estimated by fitting the sample’s trinucleotide frequencies to that of signatures SBS7a, SBS7b (Cosmic v3.3) and CH. Areas under the receiving operating characteristic curve (AUCs) were measured by comparing replicates of a given tumor fraction to TF=0 replicates. For boxplots in (A) and (E), the lower and upper ends of boxes represent the 25th and 75th percentiles of the data, respectively, and the horizontal lines represent the median. The whiskers represent at most 1.5 times the IQR. P-values were calculated using a two-sided Wilcoxon test. Scans from Figure 2C are adapted from Widman et al., 2024.
Figure 3.
Figure 3.. Mutational signature analysis of cell-free DNA from urothelial cancer patients.
A Plasma mutational scores from APOBEC mutagenesis (SBS2, SBS13) in patients with urothelial cancer (stage II-IV, n=20) and cancer-free controls (n=10). Plasma signatures were fit to a custom reference of signatures comprising SBS2, SBS13, SBS31 and SBS35 (Cosmic v3.3) and clonal hematopoiesis. Tumor fractions were measured in a tumor-informed manner (see Methods). B Trinucleotide frequency of circulating cell-free DNA (“plasma”, left) and of the tumor (right) for patient BLA-12. For the patient tumor, the three highest SBS contributions are highlighted (SBS31, SBS13, SBS2). Contributions were fit to the entire Cosmic v3.3 catalog. C Tumor-informed measurement of APOBEC-derived mutations (SBS2 and SBS13, x-axis) versus tumor-agnostic APOBEC-derived mutations (y-axis) in N=13 samples for which tumor tissue sequencing was available. The tumor-agnostic APOBEC mutations were measured as in (A). Plasma signatures were fit to a custom reference of signatures comprising SBS2, SBS13, SBS31 and SBS35 (Cosmic v3.3) and clonal hematopoiesis. Tumor signatures were fit to the entire Cosmic v3.3 catalog. The shaded area represents the 95% confidence interval of the data distribution. D Plasma mutational scores from platinum therapy mutagenesis (SBS31+SBS35) in patients with urothelial cancer (stage II-IV; n=11 post, n=9 pre) and cancer-free controls (n=10). Plasma signatures were fit to a custom reference of signatures comprising SBS2, SBS13, SBS31 and SBS35 (Cosmic v3.3) and clonal hematopoiesis. E Barplot of tumor-informed tumor fractions (black dots) and platinum scores (purple; as calculated in D) for samples with available tumors for sequencing. For boxplots in (A) and (D), the lower and upper ends of boxes represent the 25th and 75th percentiles of the data, respectively, and the horizontal lines represent the median. The whiskers represent at most 1.5 times the IQR. P-values were calculated using a two-sided Wilcoxon test.

References

    1. Cohen JD et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science 359, 926–930 (2018). - PMC - PubMed
    1. Sanz-Garcia E, Zhao E, Bratman SV & Siu LL Monitoring and adapting cancer treatment using circulating tumor DNA kinetics: Current research, opportunities, and challenges. Sci. Adv. 8, eabi8618 (2022). - PMC - PubMed
    1. Snyder MW, Kircher M, Hill AJ, Daza RM & Shendure J Cell-free DNA Comprises an In Vivo Nucleosome Footprint that Informs Its Tissues-Of-Origin. Cell 164, 57–68 (2016). - PMC - PubMed
    1. Wan JCM et al. Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nat. Rev. Cancer 17, 223–238 (2017). - PubMed
    1. Wang S et al. Potential clinical significance of a plasma-based KRAS mutation analysis in patients with advanced non-small cell lung cancer. Clin. Cancer Res. Off. J. Am. Assoc. Cancer Res. 16, 1324–1330 (2010). - PubMed

Substances

LinkOut - more resources