Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 11:9:100311.
doi: 10.1016/j.jlb.2025.100311. eCollection 2025 Sep.

Fragmentomic-based algorithm to computationally predict tumor-somatic, germline, and clonal hematopoiesis variant origin in liquid biopsy

Affiliations

Fragmentomic-based algorithm to computationally predict tumor-somatic, germline, and clonal hematopoiesis variant origin in liquid biopsy

Derek W Brown et al. J Liq Biopsy. .

Abstract

Purpose: Genomic profiling of tumors by liquid biopsy (LBx) is a pragmatic alternative to profiling tissue. Despite recent methodologic advances, clonal hematopoiesis (CH) variants arising from hematopoietic stem cells may confound LBx results. Distinguishing the origin of variants detected by LBx will greatly enhance treatment decision-making for patients with cancer.

Experimental design: We sequenced DNA isolated from paired plasma and white blood cells (WBC) at equal depth to train (n = 1977) and validate (n = 658) Variant Origin Prediction (VOP), a machine learning algorithm that leverages fragmentomics to generate probabilities that a short variant (SV) detected by LBx is tumor-somatic, germline, or CH in origin. The algorithm's classifications were validated for accuracy using paired WBC DNA and for reproducibility using LBx replicates.

Results: We show that 68% of LBx detected at least one reportable variant of CH origin. Our fragmentomic-based algorithm differentiated reportable tumor and CH variants with high sensitivity, high positive predictive value (PPA >93%, PPV >91%), and high reproducibility (>94%). Critically, VOP performs well for SVs with VAFs ≤1% (PPV >90%), as well as in genes known to harbor both CH and tumor-somatic SVs, such as TP53 (PPV >88%). In a longitudinal cohort of 422 metastatic castration-resistant prostate cancer (mCRPC) cases, VOP accurately predicted baseline variant origins, and allowed separate tracking of tumor-somatic and CH variants, including newly detected variants, at subsequent timepoints.

Conclusions: VOP is a highly accurate and reproducible method to predict the origin of SVs detected in LBx without reliance on WBC sequencing. VOP can reduce inappropriate use of targeted therapies and their toxicities for patients with variants of CH origin and enables accurate tumor profiling and monitoring.

Keywords: Clonal hematopoiesis; Comprehensive genomic profiling; Fragmentomics; Germline; Monitoring; ctDNA; liquid biopsy.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: DWB, DS, ADF, SH, MM, KP, EP, AK, ML, SZ, ZK, DF, RWM, JH, LAA, AA, BY, BJD, JDH, HT, CX are employees of Foundation Medicine, a wholly owned subsidiary of Roche, and have equity interest in Roche. ZJA is an employee of Genentech, a wholly owned subsidiary of Roche, and have equity interest in Roche. 10.13039/501100017484TP serves as a consultant and receives honoraria, grants or fundings from 10.13039/100004325AstraZeneca, BMS, 10.13039/100010544Exelixis, 10.13039/100017655Incyte, 10.13039/501100014382Ipsen, 10.13039/100030732MSD, 10.13039/100004336Novartis, 10.13039/100004319Pfizer, 10.13039/100010293Seattle Genetics, 10.13039/100004334Merck Serono, Astellas, 10.13039/100004331Johnson & Johnson, 10.13039/501100003769Eisai, 10.13039/100004337Roche, Mashup. CS received research fundings from Johnson and Johnson, Pfizer, Astellas, Bayer and provided consultancy or advisory to Johnson & Johnson, Astellas, Bayer, Genentech/Roche, Pfizer, Eli Lilly, AstraZeneca, Novartis, Advancell, BMS. ESA reports grants and personal fees from Janssen, 10.13039/100004339Sanofi, 10.13039/100004326Bayer, Bristol Myers Squibb, Curium, 10.13039/100019794MacroGenics, 10.13039/100004334Merck, 10.13039/100004319Pfizer, and 10.13039/100004325AstraZeneca, personal fees from AADI Biosciences, Alkido Pharma, Astellas, 10.13039/100002429Amgen, Blue Earth, Boundless Bio, Corcept Therapeutics, 10.13039/100030841Exact Sciences, 10.13039/100019714Foundation Medicine, Hookipa Pharma, 10.13039/100020388Invitae, 10.13039/100004312Eli Lilly and Company, 10.13039/100016273Menarini Silicon Biosystems, Tango Therapeutics, 10.13039/100022884Tempus, and Z-Alpha, and grants from 10.13039/100004336Novartis, 10.13039/100006436Celgene, and 10.13039/501100024580Orion outside the submitted work; and a patent for an AR-V7 biomarker technology issued and licensed to Qiagen. DG reports grants from 10.13039/100002429Amgen, 10.13039/100013870Astex Pharma, and 10.13039/100004328Genentech, serves as a consultant for 10.13039/100004325AstraZeneca, 10.13039/100030841Exact Sciences, 10.13039/100004328Genentech, Guardant 10.13039/100018696Health, IO Biotech, OncoHost, Adagene, Henlius USA, 10.13039/100019714Foundation Medicine, One-carbon Therapeutics, and 10.13039/100004339Sanofi, and serves on the Advisory Board for 10.13039/100020582AbbVie Foundation, Janssen, 10.13039/100004334Merck & Co, 10.13039/100016957Mirati Therapeutics, 10.13039/100009857Regeneron, and Revolution Medicine. Given his role as Editorial Board Member, DG had no involvement in the peer-review of this article and has no access to information regarding its peer review. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Fig. 1
Fig. 1
Findings from parallel plasma and buffy coat sequencing from pan-cancer liquid biopsies of 2471 patients. A) Variant allele frequencies (VAF) of 10,415 reportable variants in the buffy coat (BC) and in plasma. Truth labels were assigned using the schema described in the Supplemental Methods. Briefly, variants that could not be certainly assigned as tumor-somatic or CH included variants detected in the BC with equivocal VAF ratio between plasma and BC, or variants not detected in the BC but close to the limit of detection of the assay. The relative proportions of variants are indicated at the top. Variants with uncertain origin are excluded from the analyses in B-E. B) The number of CH alterations per liquid biopsy in patients according to age. The line graph follows the mean number, and the boxplot indicates the median and interquartile range (IQR), with whiskers extending to the furthest data point within 1.5x of the IQR. C) The percentages of patients with at least one reportable variant of each kind detected in their liquid biopsy. D) Prevalence of reportable CH variants according to cancer type. CCA: cholangiocarcinoma; NSCLC: non-small cell lung cancer; CUP: carcinoma of unknown primary; CRC: colorectal cancer. E) Distribution of VAF in plasma of variants from the three origins. Boxplots inside violins indicate median and IQR. VAF distribution for reportable tumor-somatic variants: Median = 2.01, IQR= (0.52–8.70), Range = [0.09–93.21], reportable CH variants: Median = 0.66, IQR= (0.29–1.60), Range = [0.09–44.58], and reportable germline variants: Median = 49.24, IQR = (48–50.63), Range = [33.33–91.02].
Fig. 2
Fig. 2
Variant origins by gene from pan-cancer liquid biopsies of 2471 patients. A) Proportions of each of the three variant origins in variants in each gene, as well as those that could not be reliably assigned using white blood cell sequencing (uncertain). B) The proportions of targetable, reportable, and variants of unknown significance. C) Relative prevalence of samples with tumor-origin, uncertain origin, and CH origin in targetable variants.
Fig. 3
Fig. 3
A fragmentomic-based variant origin prediction algorithm. A) Schematic of the Variant Origin Prediction (VOP) algorithm inputs and outputs. ML: Machine learning. B) Overall ability of the algorithm to distinguish CH from tumor-derived variants in the test cohort (N = 658). P(tumor) = P(tumor-somatic) + P(germline). The majority of germline and tumor-somatic variants have P(tumor) close to 100%. C) Receiver-operator curve analysis. AUC: area under the curve.
Fig. 4
Fig. 4
Variant origin prediction accuracy stratified by gene. A) Algorithmic performance across genes from most accurately to least accurately predicted. B) Plots of algorithmic probabilities by variant VAF for representative example genes: PIK3CA and PTEN (genes not confounded by CH), TET2 (a CH driver gene), and TP53, ATM, and CHEK2 (genes with mixtures of tumor- and CH-derived variants). Variant label truth was assigned based on equal depth WBC sequencing. P(Tumor) and P(CH) sum to 1.0, therefore CH variants should have low P(Tumor) if predicted correctly. P(Tumor) > 0.5 threshold is marked with the horizontal line.
Fig. 5
Fig. 5
Accuracy of algorithmic classifications remains high even at lower variant allele frequencies (VAF). A) CH variant VAFs in plasma versus white blood cells for clinically relevant genes TP53, ATM, and CHEK2. Orange boxplots show the plasma and white blood cell VAF of CH variants. Blue boxplots show the plasma VAF of tumor-somatic variants in the same gene. Median VAF of these CH variants falls well below 1%. Distributions of VAFs of tumor-somatic variants overlap with those of CH variants. B) Distribution of algorithmic probabilities for tumor-somatic and CH variants, stratified by VAF. C) Sensitivity and positive predictive value of the algorithm's tumor-somatic and CH predictions.
Fig. 6
Fig. 6
Variant origin prediction for tracking tumor dynamics in the IMbassador250 cohort. A) Boxenplots showing the effect of algorithmically filtering CH variants out of the somatic variant pool in the baseline samples of 221 patients with whole blood data available. Representative illustrative cases where: B) a previously undetected tumor-somatic variant appeared on-treatment, C) a previously undetected tumor-somatic variant appeared at progression, D) a previously undetected CH variant appeared on-treatment, E) a CH variant was the major somatic allele at baseline (confounding baseline MSAF, and in this case obscuring tumor clearance), F) a CH variant was not the major somatic allele at baseline but became one on-treatment (confounding on-treatment MSAF and obscuring tumor clearance), G) no tumor-somatic variants were present at baseline or on-treatment, but CH variants were present, potentially mistakable for tumor-somatic variants. Dashed lines denote tumor-somatic variant complete clearance.

References

    1. Jaiswal S. Clonal hematopoiesis and nonhematologic disorders. Blood. 2020;136(14):1606–1614. doi: 10.1182/blood.2019000989. - DOI - PMC - PubMed
    1. Hu Y., Ulrich B.C., Supplee J., Kuang Y., Lizotte P.H., Feeney N.B., et al. False-positive plasma genotyping due to clonal hematopoiesis. Clin Cancer Res. 2018;24(18):4437–4443. doi: 10.1158/1078-0432.Ccr-18-0143. - DOI - PubMed
    1. Bolton K.L., Gillis N.K., Coombs C.C., Takahashi K., Zehir A., Bejar R., et al. Managing clonal hematopoiesis in patients with solid tumors. J Clin Oncol. 2019;37(1):7–11. doi: 10.1200/jco.18.00331. - DOI - PMC - PubMed
    1. Bolton K.L., Ptashkin R.N., Gao T., Braunstein L., Devlin S.M., Kelly D., et al. Cancer therapy shapes the fitness landscape of clonal hematopoiesis. Nat Genet. 2020;52(11):1219–1226. doi: 10.1038/s41588-020-00710-0. - DOI - PMC - PubMed
    1. Chabon J.J., Hamilton E.G., Kurtz D.M., Esfahani M.S., Moding E.J., Stehr H., et al. Integrating genomic features for non-invasive early lung cancer detection. Nature. 2020;580(7802):245–251. doi: 10.1038/s41586-020-2140-0. - DOI - PMC - PubMed

LinkOut - more resources