Development of a predictive model of venous thromboembolism recurrence in anticoagulated cancer patients using machine learning
- PMID: 37348318
- DOI: 10.1016/j.thromres.2023.06.015
Development of a predictive model of venous thromboembolism recurrence in anticoagulated cancer patients using machine learning
Abstract
Introduction: Patients with cancer and venous thromboembolism (VTE) show a high risk of VTE recurrence during anticoagulant treatment. This study aimed to develop a predictive model to assess the risk of VTE recurrence within 6 months at the moment of primary VTE diagnosis in these patients.
Materials and methods: Using the EHRead® technology, based on Natural Language Processing (NLP) and machine learning (ML), the unstructured data in electronic health records from 9 Spanish hospitals between 2014 and 2018 were extracted. Both clinically- and ML-driven feature selection were performed to identify predictors for VTE recurrence. Logistic regression (LR), decision tree (DT), and random forest (RF) algorithms were used to train different prediction models, which were subsequently validated in a hold-out data set.
Results: A total of 16,407 anticoagulated cancer patients with diagnosis of VTE were identified (54.4 % male and median age 70). Deep vein thrombosis, pulmonary embolism and metastases were observed in 67.2 %, 26.6 %, and 47.7 % of the patients, respectively. During the study follow-up, 11.4 % of the patients developed a recurrent VTE, being more frequent in patients with lung cancer. Feature selection and model training based on ML identified primary pulmonary embolism, deep vein thrombosis, metastasis, adenocarcinoma, hemoglobin and serum creatinine levels, platelet and leukocyte count, family history of VTE, and patients' age as predictors of VTE recurrence within 6 months of VTE diagnosis. The LR model had an AUC-ROC (95 % CI) of 0.66 (0.61, 0.70), the DT of 0.69 (0.65, 0.72) and the RF of 0.68 (0.63, 0.72).
Conclusions: This is the first ML-based predictive model designed to predict 6-months VTE recurrence in patients with cancer. These results hold great potential to assist clinicians to identify the high-risk patients and improve their clinical management.
Keywords: Anticoagulants; Cancer patients; Electronic health records; Machine learning; Natural language processing; Predictive model; Venous thromboembolism recurrence.
Copyright © 2023. Published by Elsevier Ltd.
Conflict of interest statement
Declaration of competing interest AM has received personal fees and non-financial support from Celgene, Sanofi, Pfizer-Bristol Myers Squibb (BMS), LEO Pharma, Daiichi Sankyo, Incyte, AstraZeneca, MSD Oncology, Lilly, Roche, Rovi, Bayer, Servier Menarini, Merk Serono, and Amgen. RL has received personal fees from Rovi, LEO Pharma, Sanofi, and Janssen. JS has received personal fees and non-financial support from Rovi, Stago Laboratories, Pfizer, LEO Pharma, and Devicare. BO has received personal fees from Sanofi, Lilly, Angellini, LEO Pharma, and Rovi. JA has received personal fees from Servier, Merk, Bayer, Amgen, Sirtex Medical, Sanofi, Roche, and Celgene. CA has received non-financial support from GlaxoSmithKline, Pfizer, MSD Oncology, Novartis, and Pierre Fabre. AG has received personal fees from Roche, Amgen, MSD Oncology, Eisai Europe, Novartis, and Pierre Fabre. AM, RL, JS, BO, AS, JA, CA, DG, AG, MV have received an honorarium from Pfizer and BMS in connection with the development of this manuscript. VF and CR are employees of Savana Research, which was a paid consultant to Pfizer and BMS. MH is an employee of Pfizer Company, one of the study sponsors.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Medical