Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 May 8:rs.3.rs-2870367.
doi: 10.21203/rs.3.rs-2870367/v1.

Application of Machine Learning to the Prediction of Cancer-Associated Venous Thromboembolism

Affiliations

Application of Machine Learning to the Prediction of Cancer-Associated Venous Thromboembolism

Simon Mantha et al. Res Sq. .

Abstract

Venous thromboembolism (VTE) is a common and impactful complication of cancer. Several clinical prediction rules have been devised to estimate the risk of a thrombotic event in this patient population, however they are associated with limitations. We aimed to develop a predictive model of cancer-associated VTE using machine learning as a means to better integrate all available data, improve prediction accuracy and allow applicability regardless of timing for systemic therapy administration. A retrospective cohort was used to fit and validate the models, consisting of adult patients who had next generation sequencing performed on their solid tumor for the years 2014 to 2019. A deep learning survival model limited to demographic, cancer-specific, laboratory and pharmacological predictors was selected based on results from training data for 23,800 individuals and was evaluated on an internal validation set including 5,951 individuals, yielding a time-dependent concordance index of 0.72 (95% CI = 0.70-0.74) for the first 6 months of observation. Adapted models also performed well overall compared to the Khorana Score (KS) in two external cohorts of individuals starting systemic therapy; in an external validation set of 1,250 patients, the C-index was 0.71 (95% CI = 0.65-0.77) for the deep learning model vs 0.66 (95% CI = 0.59-0.72) for the KS and in a smaller external cohort of 358 patients the C-index was 0.59 (95% CI = 0.50-0.69) for the deep learning model vs 0.56 (95% CI = 0.48-0.64) for the KS. The proportions of patients accurately reclassified by the deep learning model were 25% and 26% respectively. In this large cohort of patients with a broad range of solid malignancies and at different phases of systemic therapy, the use of deep learning resulted in improved accuracy for VTE incidence predictions. Additional studies are needed to further assess the validity of this model.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures Simon Mantha, Subrata Chatterjee, Rohan Singh and John Cadley have filed a U.S. patent application related to this work. Simon Mantha is principal owner for Daboia Consulting LLC. Andrés Muñoz Martín has the following disclosures to report: consultant or advisory role for Pfizer-BMS, Sanofi, Celgene, Leo Pharma, Incyte, Astra-Zeneca, MSD, Lilly, Servier, Bayer and Roche; research funding from Leo Pharma, Sanofi and Celgene; paid speaker for Rovi, Bayer, Menarini, Stada and Daichii Sankyo; intellectual property rights for a risk assessment model of venous thromboembolism in cancer patients (work distinct from what is reported herein). Magdalena Ruiz is a medical advisor at IQVIA (CRO company). Gerald A. Soff has received research support or consulted for Johnson and Johnson/Janssen Scientific Affairs, Sobi/Dova Pharmaceuticals, Anthos Therapeutics, Luzsana (HengruiUSA), and Sanofi. The other authors have no potential conflicts of interest to report.

Figures

Figure 1
Figure 1. Flow Diagram of Patient Selection for the Three Cohorts
A: Main MSK Cohort B: External MSK Cohort C: ONCOTHROMB Cohort *First sub-cohort consisted of adults with blood control drawn for MSK-IMPACT between 2014 and 2016 †Second sub-cohort consisted of adults with blood control drawn for MSK-IMPACT between 2017 and 2019 ‡Patients randomly allocated, stratifying by event type.
Figure 2
Figure 2. Diagram of Data Flow
A: The main MSK cohort training set is utilized to derive and assess the performance of models corresponding to predefined feature sets using five-fold cross-validation. Three machine learning algorithms are evaluated: Fine-Gray competing risk regression (FG), random survival forests (RSF) and DeepHit (DH). B: The models are compared based on their respective C-index and perceived clinical usefulness. The feature set corresponding to the best model is selected and used to derive a new model from the entirety of the main MSK cohort training set. C: This final model is validated on the main MSK cohort validation set. D: Secondary models A and B are derived using the same feature set as derived in (B), excluding features for which the values are unknown in the external MSK cohort and the ONCOTHROMB cohort respectively. E: Secondary model B is validated on the entirety of the ONCOTHROMB cohort. F: Secondary model A is validated on the external MSK cohort validation set. As an exploratory analysis, this model is updated on the external MSK cohort transfer learning set and validated on the corresponding validation set.
Figure 3
Figure 3. Distribution of Times from Cancer Diagnosis to Main MSK Cohort Entry
Cancer diagnosis time corresponds to first pathological evidence of neoplasia and cohort entry is defined by report of MSK-IMPACT results.
Figure 4
Figure 4. Cancer-Associated VTE Cumulative Incidence Functions in the Main MSK Cohort
Cumulative incidence functions were derived from the Kaplan-Meier and the competing risk estimators, the latter using the Aalen-Johansen method.
Figure 5
Figure 5. Receiver Operating Characteristic (ROC) Curve for the Selected Model
ROC plot computed using the selected DeepHit model featuring a limited set of covariates fitted on the main MSK cohort training set and evaluated in the corresponding validation set.
Figure 6
Figure 6. Cumulative Incidence of VTE Stratified by Predicted Risk Group for the Selected Model in the Main MSK Cohort
Cumulative incidence functions were derived from the competing risk estimators. Patients grouped by 180-day VTE risk interval based on model prediction and using quantile cutoff points.
Figure 7
Figure 7. Inspection of Model Features
A: Absolute contribution of the 20 features with the highest SHAP values B: Distribution of SHAP values for the 20 features with the highest contribution
Figure 8
Figure 8. Cumulative Incidence of VTE Stratified by Predicted Risk Group for Secondary Model A in the External MSK Cohort Validation Set
Cumulative incidence functions were derived from the competing risk estimators. Patients grouped by 180-day VTE risk interval based on model prediction and using quantile cutoff points.
Figure 9
Figure 9. Cumulative Incidence of VTE Stratified by Predicted Risk Group for Secondary Model B in the ONCOTHROMB Cohort
Cumulative incidence functions were derived from the competing risk estimators. Patients grouped by 180-day VTE risk interval based on model prediction and using quantile cutoff points.

References

    1. Timp J.F., Braekkan S.K., Versteeg H.H. & Cannegieter S.C. Epidemiology of cancer-associated venous thrombosis. Blood 122, 1712–1723 (2013). - PubMed
    1. Falanga A., Schieppati F. & Russo L. Pathophysiology 1. Mechanisms of Thrombosis in Cancer Patients. Cancer Treat Res 179, 11–36 (2019). - PubMed
    1. Horsted F., West J. & Grainge M.J. Risk of venous thromboembolism in patients with cancer: a systematic review and meta-analysis. PLoS Med 9, e1001275 (2012). - PMC - PubMed
    1. Khorana A.A., Francis C.W., Culakova E., Kuderer N.M. & Lyman G.H. Thromboembolism is a leading cause of death in cancer patients receiving outpatient chemotherapy. J Thromb Haemost 5, 632–634 (2007). - PubMed
    1. Khorana A.A., et al. Rivaroxaban for Thromboprophylaxis in High-Risk Ambulatory Patients with Cancer. N Engl J Med 380, 720–728 (2019). - PubMed

Publication types

LinkOut - more resources