Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 3;13(1):7475.
doi: 10.1038/s41467-022-35076-w.

A framework for clinical cancer subtyping from nucleosome profiling of cell-free DNA

Affiliations

A framework for clinical cancer subtyping from nucleosome profiling of cell-free DNA

Anna-Lisa Doebley et al. Nat Commun. .

Erratum in

Abstract

Cell-free DNA (cfDNA) has the potential to inform tumor subtype classification and help guide clinical precision oncology. Here we develop Griffin, a framework for profiling nucleosome protection and accessibility from cfDNA to study the phenotype of tumors using as low as 0.1x coverage whole genome sequencing data. Griffin employs a GC correction procedure tailored to variable cfDNA fragment sizes, which generates a better representation of chromatin accessibility and improves the accuracy of cancer detection and tumor subtype classification. We demonstrate estrogen receptor subtyping from cfDNA in metastatic breast cancer. We predict estrogen receptor subtype in 139 patients with at least 5% detectable circulating tumor DNA with an area under the receive operator characteristic curve (AUC) of 0.89 and validate performance in independent cohorts (AUC = 0.96). In summary, Griffin is a framework for accurate tumor subtyping and can be generalizable to other cancer types for precision oncology applications.

PubMed Disclaimer

Conflict of interest statement

G.H., A.L.D., J.B.H., R.D.P., D.M., P.S.N., N.D.S., are inventors on a patent application (PCT/US2022/024082) entitled CELL-FREE DNA SEQUENCE DATA ANALYSIS METHOD TO EXAMINE NUCLEOSOME PROTECTION AND CHROMATIN ACCESSIBILITY submitted by Fred Hutchinson Cancer Research Center relating to the methodologies developed and applied in this manuscript. P.P. is now an employee of C2i Genomics.

Figures

Fig. 1
Fig. 1. Griffin framework for cfDNA nucleosome profiling to predict cancer subtypes and tumor phenotype.
a Illustration of a group of accessible sites (left panel) and inaccessible sites (right panel), such as TFBSs. The nucleosomes (in grey) are positioned in an organized manner around the accessible sites (red box; left panel), but not around the inaccessible ones (right panel). These nucleosomes protect the DNA from degradation when it is released into peripheral blood. The protected fragments from the plasma are sequenced and aligned, leading to a coverage profile which reflects the nucleosome protection in the cells of origin. b Griffin workflow for cfDNA nucleosome profiling analysis. cfDNA whole genome sequencing (WGS) data with ≥0.1× coverage is aligned to hg38 genome build. (1) For each sample, fragment-based GC bias is computed for each fragment size. (2) Sites of interest are selected from any assay. Paired-end reads aligned to each site are collected, fragment midpoint coverage is counted and corrected for GC bias to produce a coverage profile. (3) Coverage profiles from all sites in a group (e.g., open chromatin for tumor subtype) are averaged to produce a composite coverage profile. Composite profiles are normalized using the surrounding region (−5 kb to +5 kb). (4) Three features are extracted from the composite coverage profile: central coverage (coverage from −30 bp to +30 bp from the site; orange ‘a’), mean coverage (between −1 kb to +1 kb; green ‘b’), and amplitude calculated using a Fast-Fourier Transform (FFT) (red ‘c’).
Fig. 2
Fig. 2. Griffin GC bias correction improves detection of tissue specific accessibility from cfDNA.
a Mean ± IQR of GC content around 10,000 GRHL2 sites. b GC bias of various fragment sizes for cfDNA from a healthy donor (HD_46; green) and a metastatic breast cancer (MBC_315; orange) sample. GRHL2 center and flanking GC content are noted with dashed lines (same as [a]). The MBC sample (orange dots) has a larger difference between center (2.11) and flanking (1.99) for 165 bp fragments than the healthy sample (1.90 center, 1.96 flanking; green dots). This means that, for GRHL2, GC bias will cause increased central coverage relative to the flanking coverage and this effect will be more pronounced in the MBC sample. c Composite coverage profile of 10,000 GRHL2 sites before and after GC correction, shown for HD_46 and MBC_315. Before GC correction, the center has increased coverage due to GC bias. After GC correction, the MBC sample has lower central coverage, which is consistent with increased GRHL2 activity in tumor cells. d Composite coverage profiles of 10,000 LYL1 sites before and after GC correction, shown for two MBC samples with deep WGS (9–25×, orange), two healthy samples (17–20×, green), and 191 MBC samples with ULP-WGS (0.1–0.3×, median ± IQR, blue). Lower central coverage in the healthy samples is consistent with LYL1 activity in hematopoiesis. e cfDNA tumor fraction and central coverage correlation for LYL1. GC correction increases the strength of the Pearson correlation (n = 191 MBC ULP-WGS samples; 2 sided with Benjamini-Hochberg FDR correction). Root mean squared error (RMSE) of the linear fit is shown. f Distribution of the RMSE (linear fit between central coverage and tumor fraction (n = 191 MBC ULP-WGS samples) across 377 TFs, before and after GC correction. Boxed range: median ± IQR, whiskers: non-outlier data (maximum extent is 1.5× IQR), grey dots: outliers. p-value from the Wilcoxon signed-rank test (two-sided). g Distribution of the mean absolute deviation (of the central coverage across 215 healthy donors [1–2× WGS]) for 377 TFs, before and after GC correction. Box elements are the same as f. p-value from the Wilcoxon signed-rank test (two-sided). Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Griffin enables accurate cancer detection.
Receiver operator characteristic (ROC) curves for logistic regression classification of cancer vs. healthy controls in three cohorts. Logistic regression was performed on the top PCA components which explained 80% of the variance in the features (central coverage, mean coverage, and amplitude) extracted from nucleosome profiles around 30,000 TFBSs for each of 270 TFs. ROC and area under the ROC curve (AUC) performance is shown for each disease stage. The number of cancer samples (Ca) is indicated for each stage. Each ROC curve also includes all healthy controls (H) from that cohort. 95% confidence intervals (CI) were obtained from 1000 bootstrap iterations. a Performance for DELFI cohort consisting of plasma samples for 208 early-stage cancers and 215 healthy controls. b Comparison of the performance in the DELFI cohort before and after GC correction using Griffin. Samples are the same as in a. Boxplots indicate median, interquartile range (IQR), whiskers for 1.5× IQR, and outliers. c Performance of the LUCAS cohort consisting of plasma from 129 lung cancer patients and 158 healthy patients. d Performance of the LUCAS validation cohort consisting of plasma for 46 lung cancers and 385 healthy controls. For each dataset, performance is shown for both the original low pass (1–2×) WGS and ultra-low pass (0.1×) WGS generated by in-silico downsampling. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Griffin enables accurate prediction of breast cancer estrogen receptor subtypes from ultra-low pass WGS.
a ER + and ER- open chromatin sites from assay for transposase-accessible chromatin using sequencing (ATAC-seq) in ER + (n = 44) and ER- (n = 15) breast tumors from The Cancer Genome Atlas (TCGA). Differential sites were identified using DESeq2 which employs a Wald test with Benjamini-Hochberg FDR correction. Sites with an adjusted p-value <5 × 10−4 and a log2 fold-change >0.5 or < −0.5 (dashed lines) were considered differential and are shown in blue (ER + ) or orange (ER-). b Composite coverage profiles (median ± IQR) for ER + (n = 18,240) and ER- (n = 19,347) sites in MBC patients (≥0.1 tumor fraction; ER + , n = 50; ER-, n = 51). Differential sites shared with hematopoietic cells have been excluded and are shown in Supplementary Fig. 12a. c Tumor and cfDNA characteristics for 101 MBC patients with ≥0.10 tumor fraction plotted with CoMut. Statuses are from immunohistochemistry on tumor tissue. Top row: Binary ER status used for training and testing the model. Second row: primary (upper left triangle) and metastatic (lower right triangle) ER status. Third row: tumor fraction from ichorCNA. Fourth row: median probability ER + predicted across 1000 bootstrap iterations. d Receiver operator characteristic (ROC) curve for predicting ER status. 95% CIs from 1000 bootstrap iterations. e Performance of the trained model on samples from three validation cohorts. f Predictions in patients grouped by primary and metastatic ER status. P-values from Fisher’s exact test (two-sided). g ROC curve for predicting ER loss among patients with a primary ER positive tumor. h Timelines for two patients with multiple biopsies and cfDNA samples. Top: predicted probability of ER + and tumor fraction for cfDNA samples with ≥0.05 tumor fraction and ≥0.1× coverage. Bottom: timeline in months from metastatic diagnosis. The square indicates primary ER status (timeline from primary to metastatic diagnosis is not to scale). Diamonds indicate each metastatic ER status. Patient MBC_1413 had 3 metastatic biopsies, ER- at zero months (pleural fluid), weak ER + (5%) at 5.9 months (liver), and ER- at 12.3 months (pleural fluid). Patient MBC_1099 had 3 metastatic biopsies, ER- at 0 months (bone), ER- at 7 months (liver), and ER low (5%) at 22.5 months (liver). Source data are provided as a Source Data file.

Similar articles

Cited by

References

    1. Heitzer E, Auinger L, Speicher MR. Cell-free DNA and apoptosis: how dead cells inform about the living. Trends Mol. Med. 2020;26:519–528. doi: 10.1016/j.molmed.2020.01.012. - DOI - PubMed
    1. Diehl F, et al. Circulating mutant DNA to assess tumor dynamics. Nat. Med. 2008;14:985–990. doi: 10.1038/nm.1789. - DOI - PMC - PubMed
    1. Maheswaran S, et al. Detection of mutations in EGFR in circulating lung-cancer cells. N. Engl. J. Med. 2008;359:366–377. doi: 10.1056/NEJMoa0800668. - DOI - PMC - PubMed
    1. Wan JCM, et al. Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nat. Rev. Cancer. 2017;17:223–238. doi: 10.1038/nrc.2017.7. - DOI - PubMed
    1. Cohen JD, et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Sci. (N. Y., N. Y.) 2018;359:926–930. doi: 10.1126/science.aar3247. - DOI - PMC - PubMed

Publication types