Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 11;15(678):eabm6863.
doi: 10.1126/scitranslmed.abm6863. Epub 2023 Jan 11.

Genome-wide analysis of aberrant position and sequence of plasma DNA fragment ends in patients with cancer

Affiliations

Genome-wide analysis of aberrant position and sequence of plasma DNA fragment ends in patients with cancer

Karan K Budhraja et al. Sci Transl Med. .

Abstract

Genome-wide fragmentation patterns in cell-free DNA (cfDNA) in plasma are strongly influenced by cellular origin due to variation in chromatin accessibility across cell types. Such differences between healthy and cancer cells provide the opportunity for development of novel cancer diagnostics. Here, we investigated whether analysis of cfDNA fragment end positions and their surrounding DNA sequences reveals the presence of tumor-derived DNA in blood. We performed genome-wide analysis of cfDNA from 521 samples and analyzed sequencing data from an additional 2147 samples, including healthy individuals and patients with 11 different cancer types. We developed a metric based on genome-wide differences in fragment positioning, weighted by fragment length and GC content [information-weighted fraction of aberrant fragments (iwFAF)]. We observed that iwFAF strongly correlated with tumor fraction, was higher for DNA fragments carrying somatic mutations, and was higher within genomic regions affected by copy number amplifications. We also calculated sample-level means of nucleotide frequencies observed at genomic positions spanning fragment ends. Using a combination of iwFAF and nine nucleotide frequencies from three positions surrounding fragment ends, we developed a machine learning model to differentiate healthy individuals from patients with cancer. We observed an area under the receiver operative characteristic curve (AUC) of 0.91 for detection of cancer at any stage and an AUC of 0.87 for detection of stage I cancer. Our findings remained robust with as few as 1 million fragments analyzed per sample, demonstrating that analysis of fragment ends can become a cost-effective and accessible approach for cancer detection and monitoring.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: KKB, BRM, HM, and MM are inventors on patent applications covering technologies described here including patent application number PCT/US20/41469, titled “Methods of detecting disease and treatment response in cfDNA”. MM has consulted for AstraZeneca, Bristol Myers Squibb, Castle Biosciences, currently consults for Translational Genomics Research Institute (TGen), serves on the scientific advisory board of and holds stock options for PetDx. BRM, TCC, TKM, and MM are inventors on patent applications submitted by TGen related to cancer genomics and cell-free DNA analyses that have been licensed to Exact Sciences, under terms reviewed and approved by TGen. TKM is an employee and shareholder of Delfi Diagnostics, serves on the scientific advisory board of Deepcell and Omniome. All other authors declare that they have no competing interests.

Figures

Fig. 1.
Fig. 1.. Evaluation of information-weighted fraction of aberrant fragments (iwFAF) using plasma DNA whole-genome sequencing.
(A) Boxplots showing distributions of iwFAF values in plasma DNA of healthy individuals (green), patients with cancer (blue), and patients with non-malignant disease (gray) from four studies including the current study(12, 15, 16). The number of samples included in each category are indicated in parentheses. Each group of cancer patients and patients with non-malignant disease were compared to corresponding group of healthy individuals. P values for pairwise comparisons are reported in table S2. Two outliers (iwFAF of 0.6812 and 0.6814) were removed from the plot to improve visualization, both samples from patients with metastatic breast cancer in the Adalsteinsson et al. dataset(15). Abbreviations: CCA, cholangiocarcinoma; GBM, glioblastoma; HCC, hepatocellular carcinoma. (B) Scatterplot comparing tumor fraction with iwFAF in 938 samples from patients with cancer (blue) and 24 samples from healthy individuals (green). Plasma samples with at least 3% tumor fraction measured using ichorCNA were included in this comparison. Tumor fraction and iwFAF were strongly correlated (Spearman’s ρ = 0.77, P = 4.66 × 10−190). (C) Boxplots show distribution of iwFAF z-scores in regions with copy number loss, neutral, or gain across 27 samples with at least 20% tumor fraction from patients with metastatic melanoma. Z-scores were calculated using the mean and standard deviation of copy number neutral regions from each patient. (D) Bar charts showing iwFAF values calculated from fragments overlapping tumor-specific single-nucleotide variants in plasma samples from two patients with metastatic melanoma. iwFAF was calculated from all fragments (gray), fragments carrying the tumor-specific allele (blue), and fragments carrying the wild-type allele (green). iwFAF values for mutated fragments were significantly higher than mutated fragments (P = 1.6 × 10−15 and P = 3.6 × 10−4, two-proportion Z-test).
Fig. 2.
Fig. 2.. Comparison of tumor fraction and iwFAF in longitudinal samples from patients with cancer.
(A) iwFAF values (upper graphs) and tumor fractions inferred using ichorCNA (lower graphs) plotted for longitudinal plasma samples from two patients with metastatic melanoma. Green and blue shaded regions indicate courses of treatment. Vertical lines indicate response measured by imaging (RECIST): Purple indicates stable disease and red indicates progressive disease. Standard deviations were calculated for each iwFAF measurement based on the number of sequenced fragments and corresponding observed standard deviation in resampling experiments from control samples. (B) Scatterplot comparing change in iwFAF with change in ichorCNA tumor fraction between 63 pairs of samples with measurable tumor fraction obtained from 13 patients with metastatic melanoma (Spearman’s ρ = 0.68, P = 1.02 × 10−9). (C) iwFAF values (upper graphs) and tumor fractions determined using TARDIS (lower graphs) plotted for longitudinal plasma samples from two patients with glioblastoma. Vertical red lines indicate clinical disease progression. (D) Scatterplot comparing change in iwFAF with change in TARDIS tumor fraction between 17 pairs of samples with measurable tumor fraction from three patients with glioblastoma (Spearman’s ρ = 0.67, P = 3.47 × 10−3). Five outliers were excluded from the plot shown to improve visualization, with iwFAF change between timepoints of 5.878 × 10−3, −1.950 × 10−4, −4.223 × 10−3, 6.784 × 10−3, and 7.348 × 10−3 corresponding to tumor fraction changes of 1.1617 × 10−2, −1.0003 × 10−2, −1.288 × 10−3, 6.3 × 10−5, and 5.7 × 10−5 (data S7).
Fig. 3.
Fig. 3.. Analysis of nucleotide frequencies from genomic loci spanning fragment ends.
(A) Schematic of fragment end nucleotide frequency calculations used in GALYFRE. Nucleotide frequencies were measured for positions 10 bp inside (positions 1 to 10) and outside (positions −1 to −10) of each fragment end base (position 0), on the left and right side of each fragment separately. We calculated the frequency of each nucleotide at each position, across all aligned fragments in each plasma DNA sample. (B) Heatmap showing the magnitude of correlation between iwFAF and each nucleotide frequency at each position. Frequencies were calculated using 948 samples from 400 patients with metastatic breast cancer. Darker colors indicate a stronger correlation (range of magnitudes of correlation values: 0.003 to 0.66). The sum of correlations at each position is shown in gray above the heatmap. (C) Mean adjusted magnitude of regression coefficients obtained from a generalized linear model predicting iwFAF from 64 nucleotide frequencies.
Fig. 4.
Fig. 4.. Diagnostic performance for cancer detection using genome-wide analysis of fragment ends.
Results from a random forest classifier (GALYFRE) trained to distinguish cancer patients from healthy individuals, using iwFAF and nucleotide frequencies at fragment ends in plasma whole genome sequencing data. Training and cross-validation were performed using samples from 196 healthy individuals and 465 patients with cancer, representing 10 cancer types. (A) Overall performance from patient samples found in this study, Cristiano et al., and Jiang et al. combined based on iwFAF alone, the set of 9 nucleotide frequencies, and the combination of the two (GALYFRE). (B) GALYFRE performance by disease stage. Performance by tumor type and by stage within each tumor type is shown in fig. S16 and fig. S17 and sensitivity values at 95% specificity are recorded in table S8.

References

    1. Wong FCK, Lo YMD, Prenatal Diagnosis Innovation: Genome Sequencing of Maternal Plasma. Annual Review of Medicine 67, 419–432 (2016). - PubMed
    1. Burnham P, Khush K, De Vlaminck I, Myriad Applications of Circulating Cell-Free DNA in Precision Organ Transplant Monitoring. Annals of the American Thoracic Society 14, S237–S241 (2017). - PMC - PubMed
    1. Van Der Pol Y, Mouliere F, Toward the Early Detection of Cancer by Decoding the Epigenetic and Environmental Fingerprints of Cell-Free DNA. Cancer Cell 36, 350–368 (2019). - PubMed
    1. Cohen JD, Li L, Wang Y, Thoburn C, Afsari B, Danilova L, Douville C, Javed AA, Wong F, Mattox A, Hruban RH, Wolfgang CL, Goggins MG, Dal Molin M, Wang TL, Roden R, Klein AP, Ptak J, Dobbyn L, Schaefer J, Silliman N, Popoli M, Vogelstein JT, Browne JD, Schoen RE, Brand RE, Tie J, Gibbs P, Wong HL, Mansfield AS, Jen J, Hanash SM, Falconi M, Allen PJ, Zhou S, Bettegowda C, Diaz LA Jr., Tomasetti C, Kinzler KW, Vogelstein B, Lennon AM, Papadopoulos N, Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science 359, 926–930 (2018). - PMC - PubMed
    1. Hu Y, Ulrich BC, Supplee J, Kuang Y, Lizotte PH, Feeney NB, Guibert NM, Awad MM, Wong KK, Janne PA, Paweletz CP, Oxnard GR, False-Positive Plasma Genotyping Due to Clonal Hematopoiesis. Clin Cancer Res 24, 4437–4443 (2018). - PubMed

Publication types

LinkOut - more resources