This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2023 Jan 10:rs.3.rs-2450090.

doi: 10.21203/rs.3.rs-2450090/v1.

Utilizing Electronic Health Records (EHR) and Tumor Panel Sequencing to Demystify Prognosis of Cancer of Unknown Primary (CUP) patients

Intae Moon^{1

2}, Jaclyn LoPiccolo³, Sylvan C Baca^{3

4}, Lynette M Sholl⁵, Kenneth L Kehl², Michael J Hassett², David Liu^{3

6}, Deborah Schrag⁷, Alexander Gusev^{2

6

8}

Affiliations

¹ Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
² Division of Population Sciences, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA.
³ Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
⁴ Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, Massachusetts.
⁵ Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
⁶ The Broad Institute of MIT & Harvard, Cambridge, MA, USA.
⁷ Memorial Sloan Kettering Cancer Center, New York, NY, USA.
⁸ Division of Genetics, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.

PMID: 36711812
PMCID: PMC9882677
DOI: 10.21203/rs.3.rs-2450090/v1

Utilizing Electronic Health Records (EHR) and Tumor Panel Sequencing to Demystify Prognosis of Cancer of Unknown Primary (CUP) patients

Intae Moon et al. Res Sq. 2023.

[Preprint]. 2023 Jan 10:rs.3.rs-2450090.

doi: 10.21203/rs.3.rs-2450090/v1.

Authors

Intae Moon^{1

2}, Jaclyn LoPiccolo³, Sylvan C Baca^{3

4}, Lynette M Sholl⁵, Kenneth L Kehl², Michael J Hassett², David Liu^{3

6}, Deborah Schrag⁷, Alexander Gusev^{2

6

8}

Affiliations

¹ Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
² Division of Population Sciences, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA.
³ Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
⁴ Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, Massachusetts.
⁵ Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
⁶ The Broad Institute of MIT & Harvard, Cambridge, MA, USA.
⁷ Memorial Sloan Kettering Cancer Center, New York, NY, USA.
⁸ Division of Genetics, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.

PMID: 36711812
PMCID: PMC9882677
DOI: 10.21203/rs.3.rs-2450090/v1

Update in

Machine learning for genetics-based classification and treatment response prediction in cancer of unknown primary.
Moon I, LoPiccolo J, Baca SC, Sholl LM, Kehl KL, Hassett MJ, Liu D, Schrag D, Gusev A. Moon I, et al. Nat Med. 2023 Aug;29(8):2057-2067. doi: 10.1038/s41591-023-02482-6. Epub 2023 Aug 7. Nat Med. 2023. PMID: 37550415 Free PMC article.

Abstract

Cancer of unknown primary (CUP) is a type of cancer that cannot be traced back to its original site and accounts for 3-5% of all cancers. It does not have established targeted therapies, leading to poor outcomes. We developed OncoNPC, a machine learning classifier trained on targeted next-generation sequencing data from 34,567 tumors from three institutions. OncoNPC achieved a weighted F1 score of 0.94 for high confidence predictions on known cancer types (65% of held-out samples). When applied to 971 CUP tumors from patients treated at the Dana-Farber Cancer Institute, OncoNPC identified actionable molecular alterations in 23% of the tumors. Furthermore, OncoNPC identified CUP subtypes with significantly higher polygenic germline risk for the predicted cancer type and significantly different survival outcomes, supporting its validity. Importantly, CUP patients who received first palliative intent treatments concordant with their OncoNPC-predicted cancer sites had significantly better outcomes (H.R. 0.348, 95% C.I. 0.210 - 0.570, p-value 2.32 × 10^-5). OncoNPC thus provides evidence of distinct CUP subtypes and offers the potential for clinical decision support for managing patients with CUP.

PubMed Disclaimer

Conflict of interest statement

Additional Declarations: There is NO Competing Interest.

Figures

**Figure 1:. Overview of model development and analysis workflow.**
**(a)** OncoNPC, a XGBoost-based classifier, was trained and evaluated using 36,729 tumor samples across 22 cancer types from Cancers of Known Primary (CKP) collected from three different cancer centers. **(b)** OncoNPC performance was evaluated on the held-out tumor samples (n = 7,289). **(c)** OncoNPC was applied to 971 CUP tumor samples at a single institution to predict primary cancer types. OncoNPC predicted CUP subtypes were then investigated for association with: **(d)** elevated germline risk, **(e)** actionable molecular alterations, **(f)** overall survival, and **(g)** prognostic somatic features. **(h)** A subset of CUP patients with detailed treatment data were evaluated for treatment-specific outcomes.

**Figure 2:. Cancer type prediction performance of OncoNPC.**
**(a),(b)** The normalized confusion matrix of OncoNPC classification performance on the held-out test set (n = 7,289) for **(a)** 22 detailed cancer types and **(b)** 10 broad cancer groups based on site and treatment (see Table 1). The sensitivity for each cancer type or cancer group is shown below each confusion matrix and the sample size is shown to the left of each confusion matrix. **(c), (d)** The performance of OncoNPC in weighted F1 score on the test set across cancer types **(c)** and groups **(d)** at 4 different prediction confidences (i.e., minimum p_max thresholds). Each dot size is scaled by the proportion of tumor samples retained. **(e)** The precision-recall curves showing OncoNPC’s performance on the test set when grouped by cancer center, biopsy site type, sequence panel version, and ethnicity. The yellow dotted curve represents the baseline performance across the entire test set.

**Figure 3:. Applying OncoNPC to CUP tumor samples and interpreting cancer type predictions.**
**(a)** Empirical distributions of prediction probabilities for correctly predicted, held-out CKP tumor samples (n = 3,429) and CUP tumor samples (n = 934) across CKP cancer types (blue) and their corresponding OncoNPC predicted cancer types for CUP tumors (green). Only OncoNPC classifications with at least 20 CUP tumor samples are shown. **(b)** Proportion of each CKP cancer type and the corresponding OncoNPC predicted CUP cancer type. All training CKP tumor samples (n = 36,445) and all held-out CUP tumor samples (n = 971) are included. For both **(a)** and **(b)**, the cancer types (x-axis) are ordered by the number of CKP tumor samples in each cancer type. **(c)** Germline Polygenic Risk Score (PRS) enrichment of the CKP tumor samples (n = 11,332) and CUP tumor samples with available PRS data (n = 505) averaged across 8 cancer types. The magnitude of the enrichment is quantified by ${\hat{Δ}}_{PRS} :$ the mean difference between the concordant (i.e., OncoNPC matching) cancer type PRS and mean of PRSs of discordant cancer types (see Methods). ${\hat{Δ}}_{PRS}$ is shown for CKPs in blue (for reference) and CUPs in green. As a negative control, ${\hat{Δ}}_{PRS-random}$ is also shown after permuting the OncoNPC labels. **(d)** Top 15 most important features based on mean absolute SHAP values (i.e., $\hat{μ} (| SHAP |)$ [19]) for the top 3 most frequent cancer types in the cohort: Non-Small Cell Lung Cancer (NSCLC), Invasive Breast Carcinoma (BRCA), and Pancreatic Adenocarcinoma (PAAD). The carrier rate for each feature in corresponding CKP and CUP cancer cohorts as well as the entire CKP and CUP cohorts are shown as bars going downwards and star-shaped markers, respectively. For mutation signature features that have continuous values, individuals with feature values one standard deviation above the mean were treated as positives and the rest as negative. For age, individuals above the population mean were treated as positives and the rest as negatives.

**Figure 4:. Consistent survival between OncoNPC classifications and known cancers.**
**(a)** Survival stratification for patients with CUP based on their OncoNPC predicted cancer types. The Kaplan-Meier estimator [55] was used to estimate survival probability for each predicted cancer type over the follow-up time of 60 months from sequence date, with the statistical significance assessed by Chi-square test. **(b)** Correspondence between median survival time (in months) of CUP predicted cancer types (x-axis) and those of metastatic CKP cancer types (y-axis): Spearman’s rho 0.964 (p-value: 4.54 × 10⁻⁴, Python scipy v1.7.1 [58]). The size of each dot reflects the p-value of the log-rank test for significant difference in median survival between CUP - metastatic CKP pairs. Only cancer types with at least 30 CUP tumor samples having OncoNPC probabilities greater than 0.5 are shown.

**Figure 5:. Potential for clinical decision support among OncoNPC classified CUPs.**
**(a)** The number of CUP tumor samples with actionable targets, based on OncoKB [29], across actionable somatic variants (mutations, amplifications, and fusions). Each bar corresponds to an actionable target, color-coded by the number of each OncoNPC classified CUP carrier. Note that each tumor sample may contain more than one actionable somatic variant. **(b)** Proportions of CUP tumor samples with actionable somatic variants (N_action) to the total number of patients (*N_total*) across OncoNPC predicted cancer types. Proportions for 4 different therapeutic levels based on OncoKB [29], are shown in each bar: Level 1 - FDA-approved drugs, Level 2 – standard of care drugs, Level 3 - drugs supported by clinical evidence, and Level 4 - drugs supported by biological evidence. **(c), (d)** Treatment diagrams for a group of patients with CUP, who received treatments that were concordant with the OncoNPC classification **(c)** and the remaining CUP patients who received discordant treatments **(d)**. OncoNPC classification is shown on the left and treatment groups are shown on the right, with each patient connected from left to right. **(e)** Forest plot of a multivariable Cox Proportional Hazards Regression on patients in the CUP cohort with first-line palliative treatment records at DFCI (n = 158; see Supplementary Fig. S6 for the exclusion criteria). Treatment concordance (colored in blue), encoded as 1 when the first treatment a patient receives at DFCI aligns *concordant* with their corresponding OncoNPC prediction and 0 otherwise, was significantly associated with mortality of patients in the cohort (H.R. 0.348, 95% C.I. 0.210 - 0.570, p-value 2.32 × 10⁻⁵). **(f)** Estimated survival curves for patients with CUP in the concordant treatment group (shown in blue) and discordant treatment group (shown in red), respectively. To estimate the survival function for each group, we utilized Inverse Probability of Treatment Weighted (IPTW) Kaplan-Meier estimator while adjusting for left truncation until time of sequencing (see Methods). Statistical significance of the survival difference between the two groups was estimated by a weighted log-rank test [59].

See this image and copyright information in PMC

References

1. Pavlidis N., Khaled H., and Gaafar R., “A mini review on cancer of unknown primary site: A clinical puzzle for the oncologists,” Journal of advanced research, vol. 6, no. 3, pp. 375–382, 2015. - PMC - PubMed
1. Varadhachary G. R. and Raber M. N., “Cancer of unknown primary site,” New England Journal of Medicine, vol. 371, no. 8, pp. 757–765, 2014. - PubMed
1. Hyman D. M. et al. , “Vemurafenib in multiple nonmelanoma cancers with braf v600 mutations,” New England Journal of Medicine, vol. 373, no. 8, pp. 726–736, 2015. - PMC - PubMed
1. Hainsworth J. D. and Greco F. A., “Cancer of unknown primary site: New treatment paradigms in the era of precision medicine,” American Society of Clinical Oncology Educational Book, vol. 38, pp. 20–25, 2018. - PubMed
1. Anderson G. G. and Weiss L. M., “Determining tissue of origin for metastatic cancers: Meta-analysis and literature review of immunohistochemistry performance,” Applied Immunohistochemistry & Molecular Morphology, vol. 18, no. 1, pp. 3–8, 2010. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Utilizing Electronic Health Records (EHR) and Tumor Panel Sequencing to Demystify Prognosis of Cancer of Unknown Primary (CUP) patients

Affiliations

Utilizing Electronic Health Records (EHR) and Tumor Panel Sequencing to Demystify Prognosis of Cancer of Unknown Primary (CUP) patients

Authors

Affiliations

Update in

Abstract

Conflict of interest statement

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous