. 2023 Jul:7:e2200062.

doi: 10.1200/CCI.22.00062.

Machine Learning-Assisted Recurrence Prediction for Patients With Early-Stage Non-Small-Cell Lung Cancer

Adrianna Janik¹, Maria Torrente², Luca Costabello¹, Virginia Calvo², Brian Walsh^{3

4}, Carlos Camps⁵, Sameh K Mohamed^{3

4}, Ana L Ortega⁶, Vít Nováček^{3

4

7

8}, Bartomeu Massutí⁹, Pasquale Minervini¹⁰, M Rosario Garcia Campelo¹¹, Edel Del Barco¹², Joaquim Bosch-Barrera¹³, Ernestina Menasalvas¹⁴, Mohan Timilsina^{3

4}, Mariano Provencio²

Affiliations

¹ Accenture Labs, Dublin, Ireland.
² Medical Oncology Department, Hospital Universitario Puerta de Hierro Majadahonda, Madrid, Spain.
³ Data Science Institute, University of Galway, Galway, Ireland.
⁴ Insight Centre for Data Analytics, University of Galway, Galway, Ireland.
⁵ Hospital General de Valencia, Valencia, Spain.
⁶ Hospital Universitario de Jaén, Jaén, Spain.
⁷ Faculty of Informatics, Masaryk University, Brno, Czech Republic.
⁸ Masaryk Memorial Cancer Institute, Brno, Czech Republic.
⁹ Hospital General Universitario de Alicante, Alicante, Spain.
¹⁰ University College London, London, United Kingdom.
¹¹ Complejo Hospitalario Universitario A Coruña, A Coruña, Spain.
¹² Hospital Universitario de Salamanca, Salamanca, Spain.
¹³ Institut Català d'Oncologia, Hospital Universitari Dr. Josep Trueta, Girona, Spain.
¹⁴ Polytechnic University of Madrid, Madrid, Spain.

PMID: 37428988
PMCID: PMC10569772
DOI: 10.1200/CCI.22.00062

Machine Learning-Assisted Recurrence Prediction for Patients With Early-Stage Non-Small-Cell Lung Cancer

Adrianna Janik et al. JCO Clin Cancer Inform. 2023 Jul.

. 2023 Jul:7:e2200062.

doi: 10.1200/CCI.22.00062.

Authors

Affiliations

¹ Accenture Labs, Dublin, Ireland.
² Medical Oncology Department, Hospital Universitario Puerta de Hierro Majadahonda, Madrid, Spain.
³ Data Science Institute, University of Galway, Galway, Ireland.
⁴ Insight Centre for Data Analytics, University of Galway, Galway, Ireland.
⁵ Hospital General de Valencia, Valencia, Spain.
⁶ Hospital Universitario de Jaén, Jaén, Spain.
⁷ Faculty of Informatics, Masaryk University, Brno, Czech Republic.
⁸ Masaryk Memorial Cancer Institute, Brno, Czech Republic.
⁹ Hospital General Universitario de Alicante, Alicante, Spain.
¹⁰ University College London, London, United Kingdom.
¹¹ Complejo Hospitalario Universitario A Coruña, A Coruña, Spain.
¹² Hospital Universitario de Salamanca, Salamanca, Spain.
¹³ Institut Català d'Oncologia, Hospital Universitari Dr. Josep Trueta, Girona, Spain.
¹⁴ Polytechnic University of Madrid, Madrid, Spain.

PMID: 37428988
PMCID: PMC10569772
DOI: 10.1200/CCI.22.00062

Abstract

Purpose: Stratifying patients with cancer according to risk of relapse can personalize their care. In this work, we provide an answer to the following research question: How to use machine learning to estimate probability of relapse in patients with early-stage non-small-cell lung cancer (NSCLC)?

Materials and methods: For predicting relapse in 1,387 patients with early-stage (I-II) NSCLC from the Spanish Lung Cancer Group data (average age 65.7 years, female 24.8%, male 75.2%), we train tabular and graph machine learning models. We generate automatic explanations for the predictions of such models. For models trained on tabular data, we adopt SHapley Additive exPlanations local explanations to gauge how each patient feature contributes to the predicted outcome. We explain graph machine learning predictions with an example-based method that highlights influential past patients.

Results: Machine learning models trained on tabular data exhibit a 76% accuracy for the random forest model at predicting relapse evaluated with a 10-fold cross-validation (the model was trained 10 times with different independent sets of patients in test, train, and validation sets, and the reported metrics are averaged over these 10 test sets). Graph machine learning reaches 68% accuracy over a held-out test set of 200 patients, calibrated on a held-out set of 100 patients.

Conclusion: Our results show that machine learning models trained on tabular and graph data can enable objective, personalized, and reproducible prediction of relapse and, therefore, disease outcome in patients with early-stage NSCLC. With further prospective and multisite validation, and additional radiological and molecular data, this prognostic model could potentially serve as a predictive decision support tool for deciding the use of adjuvant treatments in early-stage lung cancer.

PubMed Disclaimer

Conflict of interest statement

The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated unless otherwise noted. Relationships are self-held unless noted. I = Immediate Family Member, Inst = My Institution. Relationships may not relate to the subject matter of this manuscript. For more information about ASCO's conflict of interest policy, please refer to www.asco.org/rwc or ascopubs.org/cci/author-center.

Open Payments is a public database containing information reported by companies about payments made to US-licensed physicians (Open Payments).

Adrianna Janik

Patents, Royalties, Other Intellectual Property: The team I work in focuses on BioInnovation (Inst)

Virginia Calvo

Consulting or Advisory Role: Roche/Genentech, Sanofi/Aventis, BMS, Amgen

Speakers' Bureau: MSD, Pfizer, Takeda, BMS, AstraZeneca

Travel, Accommodations, Expenses: Roche, Takeda

Vít Nováček

Consulting or Advisory Role: BioXCel Therapeutics

Bartomeu Massutí

Consulting or Advisory Role: Roche, Boehringer Ingelheim, AstraZeneca, Merck Serono, Janssen

Speakers' Bureau: Roche, AstraZeneca, Boehringer Ingelheim, Bristol Myers Squibb, Sanofi/Regeneron, Janssen Oncology, Pfizer

Travel, Accommodations, Expenses: Roche, MSD Oncology, AstraZeneca

M. Rosario Garcia Campelo

Consulting or Advisory Role: Roche/Genentech, MSD Oncology, AstraZeneca, Bristol Myers Squibb, Pfizer, Novartis, Takeda, Boehringer Ingelheim, Janssen Oncology

Speakers' Bureau: Roche, AstraZeneca, Bristol Myers Squibb, Pfizer, Novartis, Takeda, Boehringer Ingelheim, MSD Oncology, Sanofi/Aventis, Janssen Oncology, Amgen, Lilly

Travel, Accommodations, Expenses: Roche/Genentech, MSD Oncology, Pfizer

Joaquim Bosch-Barrera

Consulting or Advisory Role: Roche, Bristol Myers Squibb, MSD Oncology, AstraZeneca, Pfizer, Sanofi/Regeneron

Research Funding: Pfizer (Inst)

Travel, Accommodations, Expenses: Takeda

Mariano Provencio

Consulting or Advisory Role: Bristol Myers Squibb, Roche, MSD, AstraZeneca, Takeda, Lilly, Roche, Janssen Oncology, Pfizer, Merck

Speakers' Bureau: BMS, Roche, AstraZeneca, MSD, Takeda

Research Funding: Pierre Fabre (Inst), Roche (Inst), Boehringer Ingelheim (Inst), Bristol Myers Squibb (Inst)

Travel, Accommodations, Expenses: Roche, BMS, AstraZeneca, Boehringer Ingelheim, Bristol Myers Squibb Company, Lilly, Pierre Fabre, Takeda, MSD

No other potential conflicts of interest were reported.

Figures

**FIG 1.**
Diagram representing the prediction pipeline from the database, through criteria and features, models training, features ablation, evaluation until predicting, and explaining. CHT-RT, chemotherapy-radiotherapy; SHAP, SHapley Additive exPlanations.

**FIG 2.**
Diagram of the clinical data modeled as a knowledge graph. ALK IHQ, anaplastic lymphoma kinase immunohistochemistry; ECOG, Eastern Cooperative Oncology Group; EGFR, epidermal growth factor receptor.

**FIG 3.**
Explanations provided by the pipeline for the two models for the same patient. (A) Tabular model with 75% accuracy, trained over 1,348 patients. SHAP explanation with a waterfall plot of features contributing to the prediction, red increasing the prediction score and blue decreasing. (B) Graph machine learning model with 68% accuracy trained over 1,348 patients. Example-based explanation. (1) Prediction summary for selected patients including predicted risk, number of similar examples, and number of training cases. (2) Retrieved exemplary cases, that is, influential patients. (3) Commonalities and differences between the patient being predicted and the selected influential patient retrieved by the example-based explanation method. (a) Venn diagram view. (b) Table view. AI, artificial intelligence; ECOG, Eastern Cooperative Oncology Group; COPD, chronic obstructive pulmonary disease; HTA, high blood pressure; SHAP, SHapley Additive exPlanations.

See this image and copyright information in PMC

References

1. Sung H, Ferlay J, Siegel RL, et al. : Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 71:209-249, 2021 - PubMed
1. Cancer of the lung and bronchus—Cancer stat Ffacts. SEER. https://seer.cancer.gov/statfacts/html/lungb.html
1. Uramoto H, Tanaka F: Recurrence after surgery in patients with NSCLC. Transl Lung Cancer Res 3:242-249, 2014 - PMC - PubMed
1. CLARIFY Project. https://www.clarify2020.eu/
1. Mohamed SK, Walsh B, Timilsina M, et al. : On predicting recurrence in early stage non-small cell lung cancer. AMIA Annu Symp Proc 2021:853-862, 2021 - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine Learning-Assisted Recurrence Prediction for Patients With Early-Stage Non-Small-Cell Lung Cancer

Affiliations

Machine Learning-Assisted Recurrence Prediction for Patients With Early-Stage Non-Small-Cell Lung Cancer

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous