Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 29;13(1):5566.
doi: 10.1038/s41467-022-32995-6.

Cost-effective methylome sequencing of cell-free DNA for accurately detecting and locating cancer

Affiliations

Cost-effective methylome sequencing of cell-free DNA for accurately detecting and locating cancer

Mary L Stackpole et al. Nat Commun. .

Erratum in

  • Author Correction: Cost-effective methylome sequencing of cell-free DNA for accurately detecting and locating cancer.
    Stackpole ML, Zeng W, Li S, Liu CC, Zhou Y, He S, Yeh A, Wang Z, Sun F, Li Q, Yuan Z, Yildirim A, Chen PJ, Winograd P, Tran B, Lee YT, Li PS, Noor Z, Yokomizo M, Ahuja P, Zhu Y, Tseng HR, Tomlinson JS, Garon E, French S, Magyar CE, Dry S, Lajonchere C, Geschwind D, Choi G, Saab S, Alber F, Wong WH, Dubinett SM, Aberle DR, Agopian V, Han SB, Ni X, Li W, Zhou XJ. Stackpole ML, et al. Nat Commun. 2024 May 1;15(1):3693. doi: 10.1038/s41467-024-48018-5. Nat Commun. 2024. PMID: 38693151 Free PMC article. No abstract available.

Abstract

Early cancer detection by cell-free DNA faces multiple challenges: low fraction of tumor cell-free DNA, molecular heterogeneity of cancer, and sample sizes that are not sufficient to reflect diverse patient populations. Here, we develop a cancer detection approach to address these challenges. It consists of an assay, cfMethyl-Seq, for cost-effective sequencing of the cell-free DNA methylome (with > 12-fold enrichment over whole genome bisulfite sequencing in CpG islands), and a computational method to extract methylation information and diagnose patients. Applying our approach to 408 colon, liver, lung, and stomach cancer patients and controls, at 97.9% specificity we achieve 80.7% and 74.5% sensitivity in detecting all-stage and early-stage cancer, and 89.1% and 85.0% accuracy for locating tissue-of-origin of all-stage and early-stage cancer, respectively. Our approach cost-effectively retains methylome profiles of cancer abnormalities, allowing us to learn new features and expand to other cancer types as training cohorts grow.

PubMed Disclaimer

Conflict of interest statement

X.J.Z., W.L., and W.H.W. are co-founders of EarlyDiagnostics, Inc. X.J.Z and W.H.W are board members for EarlyDiagnostics, Inc. X.J.Z. has an executive leadership position at EarlyDiagnostics, Inc. M.L.S, X.N., and C-C.L. are employees of EarlyDiagnostics,Inc and S.M.D. was a scientific advisor to EarlyDiagnostics, Inc. X.J.Z., W.L., W.H.W., and F.A. are stockholders of EarlyDiagnostics, Inc. M.L.S, W.Z., S.L., C.-C.L., Y.Zhou, Q.L., X.N. have stock options with EarlyDiagnostics, Inc. W.L., W.Z., and Y.Zhou are consultants for EarlyDiagnostics, Inc. X.J.Z., M.L.S, Y.Zhou, X.N., and W.Z. are inventors on a patent application submitted by the Regents of the University of California and licensed to EarlyDiagnostics, Inc. (Patent No. US20210404007A1). P.S.L. performed summer internships in EarlyDiagnostics, Inc. in 2021 and 2022. The other authors have no competing interests to declare.

Figures

Fig. 1
Fig. 1. cfMethyl-Seq assay.
a Diagram of the cfMethyl-Seq protocol. b Typical TBE-UREA PAGE image of cfMethyl-Seq libraries made from cfDNA, compared with conventional RRBS with cfDNA or intact genomic DNA as input material. The non-specific ligation product from cfDNA fragments with the conventional RRBS protocol is indicated by an arrow. This technical validation experiment was repeated independently twice and showed similar results. For cfMethyl-Seq assays generating data for analysis, each sample was constructed into library without replicate. c The percentage of reads with MspI sites on both ends, on only one end, and on neither end from WGBS assay, our cfMethyl-Seq assay, and RRBS assay on cfDNA. Source data are provided as a Source Data file. d The percentage of mapped fragments that fall in CpG islands, CpG island shores, CpG island shelves, and open sea regions is shown for cfMethyl-Seq libraries, RRBS libraries, and WGBS libraries on cfDNA. Source data are provided as a Source Data file. e Methylation concordance between a genomic DNA sample sequenced with RRBS, and sheared and sequenced with cfMethyl-Seq, increases with depth of coverage. Pearson correlation (y-axis) of the methylation rate (beta value) in the two datasets was calculated on the CpG sites that are covered by both datasets at minimum depth of coverage specified on the x-axis. Source data are provided as a Source Data file. Abbreviations: RRBS Reduced representation bisulfite sequencing.
Fig. 2
Fig. 2. Study design and overview of the computational method.
a Overview of the sample usage for marker discovery, model training, and validation. All tissue samples are used for marker discovery, and all plasma samples are randomly split into three sets, used for marker discovery, training, and validating the predictive model. The plasma sample split is repeated 10 times and the prediction performance is averaged over the 10 runs. b Details of sample usage for marker discovery. Different types of methylation markers were discovered by using different samples. Note that 30 reference noncancer plasma samples (in blue boxes) correspond to “marker filtration” in a. Abbreviations: TOO tissue of origin.
Fig. 3
Fig. 3. Performance of the stacked ensemble model for cancer detection.
a ROC curve of the stacked ensemble method for detecting all four cancer types. Source data are provided as a source data file. b Sensitivity breakdown in each cancer stage and cancer type. Sensitivity is shown at 1 false positive (97.9% specificity). The average number of test cancer patients in each cancer type and stage over 10 runs is indicated in the label of the horizontal axis. Sensitivity is not computed if the average number of cancer patients in a cancer stage/type over 10 runs is <4. The points and error bars represent the average sensitivity over 10 runs and 95% confidence intervals. Source data are provided as a Source Data file. c Performance (AUROC) of using all marker types and each individual marker type (n: 102 samples). The points and error bars represent the average AUROC over 10 runs and 95% confidence intervals. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Performance of the stacked ensemble model for cancer Tissue-Of-Origin prediction.
a Confusion matrix for all-stage cancer samples. Source data are provided as a Source Data file. b Confusion matrix for early-stage (i.e., stage I/II) cancer samples. Source data are provided as a Source Data file. c The accuracy of using all marker types and each individual marker type (n: 35 samples in the test set of each run). The points and error bars represent the average accuracy over 10 runs and 95% confidence intervals. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. Impact of the number of markers and the training sample size on the cancer detection performance.
a Performance of using the union of top M cancer-specific markers of four cancer types. Source data are provided as a Source Data file. b Performance of using the union of top M tissue-specific markers of each tissue pair. Source data are provided as a Source Data file. c Performance of the ensemble model for cancer detection increases with increasing training sample size (using 30% to 100% of the training samples). Source data are provided as a Source Data file. In all figures, the points and error bars represent the average AUROC over 10 runs and 95% confidence intervals (n: 102 test samples per run).

References

    1. Cohen JD, et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science. 2018;359:926–930. doi: 10.1126/science.aar3247. - DOI - PMC - PubMed
    1. Liu MC, et al. Sensitive and specific multi-cancer detection and localization using methylation signatures in cell-free DNA. Ann. Oncol. 2020;31:745–759. doi: 10.1016/j.annonc.2020.02.011. - DOI - PMC - PubMed
    1. Cristiano S, et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature. 2019;570:385–389. doi: 10.1038/s41586-019-1272-6. - DOI - PMC - PubMed
    1. Guo S, et al. Identification of methylation haplotype blocks aids in deconvolution of heterogeneous tissue samples and tumor tissue-of-origin mapping from plasma DNA. Nat. Genet. 2017;49:635–642. doi: 10.1038/ng.3805. - DOI - PMC - PubMed
    1. Xu, R.-H. et al. Circulating tumour DNA methylation markers for diagnosis and prognosis of hepatocellular carcinoma. Nat. Mater.10.1038/nmat4997 (2017). - PubMed

Publication types

Substances