Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep 29;13(9):e012423.
doi: 10.1136/jitc-2025-012423.

Transformer-based AI approach to unravel long-term, time-dependent prognostic complexity in patients with advanced NSCLC and PD-L1 ≥50%: insights from the pembrolizumab 5-year global registry

Alessio Cortellini  1   2   3 Valentina Santo  4   5 Leonardo Brunetti  4   3 Edoardo Garbo  5 David J Pinato  3   6 Giulia La Cava  4 Jarushka Naidoo  7   8   9 Artur Katz  10 Monica Loza  11 Joel W Neal  11 Carlo Genova  12 Scott Gettinger  13 So Yeon Kim  13 Ritujith Jayakrishnan  13 Talal El Zarif  13 Marco Russano  4 Federica Pecci  5 Alessandro Di Federico  5 Joao V Alessi  14 Michele Montrone  15 Dwight H Owen  16   17 Sara Ramella  2   18 Diego Signorelli  19 Mary Jo Fidler  20 Mingjia Li  16 Andrea Camerini  21 Balazs Halmos  22 Bruno Vincenzi  4   2 Giulio Metro  23 Francesco Passiglia  24 Sai Yendamuri  25 Annalisa Guida  26 Michele Ghidini  27 Antonio D'Alessio  3 Giuseppe L Banna  28 Claudia A M Fulgenzi  3 Salvatore Grisanti  29 Francesco Grossi  30 Armida D'Incecco  31 Eleni Josephides  32 Mieke Van Hemelrijck  33 Alessandro Russo  34 Alain Gelibter  35 Gianpaolo Spinelli  36 Monica Verrico  37 Bartłomiej Tomasik  38 Raffaele Giusti  39 Kirsty Balachandran  29 Emilio Bria  30   31 Martin Sebastian  32 Maximilian Rost  32 Martin Forster  40 Uma Mukherjee  40 Lorenza Landi  34 Francesca Mazzoni  35 Avinash Aujayeb  36 Manuel Dupont  37 Alessandra Curioni-Fontecedro  37 Rita Chiari  41 Vincenzo Sforza  42 Marcello Tiseo  43   44 Alex Friedlaender  45 Alfredo Addeo  46 Federica Zoratto  47 Michele De Tursi  48 Luca Cantini  49 Elisa Roca  50 Giannis Mountzios  51 Danilo Rocco  52 Luigi Della Gravara  52 Sukumar Kalvapudi  25 Alessandro Inno  53 Paolo Bironzo  24 Rafael Di Marco Barros  32 David O'Reilly  7 Orla Fitzpatrick  7 Eleni Karapanagiotou  32 Isabelle Monnet  54 Javier Baena  55 Marianna Macerelli  56 Aida Piedra  57 Francesco Agustoni  58 Diego Luigi Cortinovis  59   60 Giuseppe Tonini  4   2 Gabriele Minuti  34 Chiara Bennati  61 Laura Mezquita  62   63 Teresa Gorría  64 Alberto Servetto  65 Teresa Beninato  66 Giuseppe Lo Russo  66 Arsela Prelaj  66   67 Andrea De Giglio  68 Jacobo Rogado  69 Laura Moliner  70 Ernest Nadal  70 Federica Biello  6 Frank Aboubakar Nana  71 Anne-Marie Dingemans  72 Joachim G J V Aerts  72 Roberto Ferrara  73   74 Taher Abu Hejleh  75   76 Kazuki Takada  77 Abdul Rafeh Naqash  78 Marina Chiara Garassino  79 Solange Peters  80 Heather A Wakelee  11 Amin H Nassar  13 Biagio Ricciuti  5 Paolo Soda  81   82 Camillo Maria Caruso #  81 Valerio Guarrasi #  81
Affiliations

Transformer-based AI approach to unravel long-term, time-dependent prognostic complexity in patients with advanced NSCLC and PD-L1 ≥50%: insights from the pembrolizumab 5-year global registry

Alessio Cortellini et al. J Immunother Cancer. .

Abstract

Background: With nearly one-third of patients with advanced non-small cell lung cancer (NSCLC) and PD-L1 Tumor Proportion Score≥50% surviving beyond 5 years following first-line pembrolizumab, long-term outcomes challenge traditional paradigms of cancer prognostication. The emergence of non-cancer-related factors and time-dependent trends underscores the need for advanced analytical frameworks to unravel their complex interplay.

Methods: We analyzed the Pembro-real 5Y registry, a global real-world dataset of 1050 patients treated across 61 institutions in 14 countries with a long-term follow-up and a large panel of baseline variables. Two complementary approaches were employed: ridge regression, chosen for its ability to address multicollinearity while retaining interpretability, and not another imputation method (NAIM), a transformer-based artificial intelligence model designed to handle missing data without imputation. Endpoints included risk of death at 6, 12, 24, 60 months and 5-year survival.

Results: The ridge regression model achieved a c-statistic of 0.66 (95% CI: 0.59 to 0.72) for the risk of death and an area under the curve (AUC) of 0.72 (95% CI: 0.65 to 0.78) for 5-year survival, identifying Eastern Cooperative Oncology Group Performance Status (ECOG-PS)≥2, increasing age, and metastatic burden as primary risk factors. However, wide CIs for some predictors highlighted statistical instability. NAIM demonstrated robust handling of missing data, with a c-index of 62.98±2.11 for risk of death and an AUC of 60.52±3.71 for 5-year survival. The comprehensive SHapley Additive exPlanations analysis revealed dynamic, time-dependent patterns, with early mortality dominated by acute factors (eg, ECOG-PS, steroids) and long-term outcomes increasingly influenced by systemic health markers (eg, absence of hypertension, increasing body mass index). Unexpected insights included the protective role of dyslipidemia (but not statins) and the nuanced impact of smoking status, reflecting evolving disease dynamics and host-tumor interplay.

Conclusions: Our integrative framework illuminates the complexity of long-term outcomes in patients with NSCLC treated with pembrolizumab, uncovering dynamic, non-linear prognostication trends. This analysis provides insights into patient trajectories, emphasizing the need for holistic, long-term management strategies.

Keywords: Immune Checkpoint Inhibitor; Immunotherapy; Lung Cancer; Survivorship.

PubMed Disclaimer

Conflict of interest statement

Competing interests: ACo received grants for consultancies/advisory boards from MSD, BMS, IQVIA, AstraZeneca, REGENERON, Amgen, Daiichi-Sankyo, Access Infinity, Ardelis Health, Alpha Sight, Guidepoint, Roche; speaker fees from AstraZeneca, Pierre-Fabre, MSD, Sanofi/REGENERON; payment for writing/editorial activity from BMS, MSD, Roche; travel support from Sanofi/REGENERON, MSD. JB declares honoraria/consulting or advisory role from Astrazeneca, BMS, Roche, Access Oncology, travel support from MSD, Roche, Janssen Oncology. GS has received payment or honoraria for advisory boards from Novartis, Roche, Bayer, unrelated to this project. DO'R has received conference attendance support from Takeda, Janssen, Servier, MSD. EB has received grants or contracts from Astra-Zeneca, Roche and honoraria for lectures from Merck-Sharp & Dome, Astra-Zeneca, Pfizer, Eli-Lilly, Bristol-Myers Squibb, Novartis, Takeda and Roche; EB has been member of Data Safety Monitoring Board or Advisory Board of Merck-Sharp & Dome, Pfizer, Novartis, Bristol-Myers Squibb, Astra-Zeneca, Celltrion and Roche. AAu declares consulting or advisory role for Bristol Myers Squibb, AstraZeneca, Boehringer Ingelheim, Roche, MSD, Pfizer, Eli Lilly, Astellas, Takeda, and Amgen; speaker’s bureau for Eli Lilly, and AstraZeneca. AR has received advisory board or speaker bureau honoraria from AstraZeneca, MSD, Novartis, Pfizer, BMS, Takeda, Amgen, Regeneron and Daiichi Sankyo; compensated activity for editorial projects from AstraZeneca, MSD, BMS, Novartis, Roche, and Regeneron. MT received speakers’ and consultants’ fee from Astra-Zeneca, Pfizer, Eli-Lilly, BMS, Novartis, Roche, MSD, Boehringer Ingelheim, Takeda, Amgen, Merck, Sanofi, Janssen, Daiichi Sankyo. He also received institutional research grants from Astra-Zeneca, Boehringer Ingelheim and Roche and travel support from Amgen and Takeda. FM received honorary for advisory board roles with MDS, BMS, Takeda, Roche, Astra-Zeneca, Novartis. PB served as consultant/advisory board for Regeneron, Pierre-Fabre, Janssen, Seagen. DHO declares research funding/grants (to institution) from BMS, Merck, Palobiofarma, Pfizer, Genentech, AstraZeneca, Nuvalent, Abbvie, Onc.AI. BH received grants for consultancies from Boehringer Ingelheim, Astra Zeneca, Merck, BMS, Advaxis, Amgen, AbbVie, Daiichi, Pfizer, GSK, Beigene, Janssen, Black Diamond Therapeutics, Forward Pharma, Numab, Arrivent; speakers fees from Astra Zeneca, Boehringer Ingelheim, Apollomics, Johnson&Johnson, Takeda, Merck, BMS, Genentech, Pfizer, Eli-Lilly, Daiichi; grants for participating on boards from BMS, TPT, Apollomics, eFECTOR, and City of Hope. BT received lecture fees from Pfizer. LC is an employee of Fortrea Inc. BT declares honoraria from Roche. IM declares travel support from Takada, MSD, Pfizer, Oxyvie and speaker fees from Regeneron. A-MD declares research grants from Amgen, the Dutch Cancer Society and HANART, consulting fees from Amgen, Bayer, Boehringer Inglheim, Sanofy, Roche, Janssen and Astrazeneca, speaker fees from Janssen, Pfizer, Astrazeneca, Lilly and Takeda, advisory board role for Takeda and Roche. GLR declares fees for advisory boards, travel support, consultancies from MSD, BMS, Roche, Sanofi, Regeneron, Lilly, Astrazeneca, Janssen, Pfizer, Novartis, Bayer, Takeda, Amgen, GSK, Daichii. TAH declares stock interests for GlaxoSmithKline and honoraria from Novartis. BR served as consultant/advisory board for AMGEN, Regeneron, AstraZeneca, Capvision. Speaker fee: AstraZeneca. Received honoraria from Targeted Oncology, SITC. All other authors declare no conflicts of interest associated with the present study.

Figures

Figure 1
Figure 1. Visual summary of the methodological pipeline used to analyze long-term outcomes of patients with advanced NSCLC (PD-L1 ≥50%) treated with first-line pembrolizumab in the Pembro-Real 5Y Registry. Two parallel approaches were adopted: Conventional modeling using penalized ridge regression for both 5-year survival (logistic) and overall survival (Cox), producing interpretable ORs and HRs with associated CIs, and model performance metrics (eg, C-statistic, AUC). NAIM (AI model) employing a transformer-based deep learning architecture to model time-dependent survival dynamics using SHAP values for feature interpretability and multiple performance metrics (AUC, F1-score, Matthews correlation coefficient [MCC], etc). Outputs were compared side-by-side. Shared predictors (eg, ECOG-PS, corticosteroid use) reinforced robustness, while NAIM identified additional time-sensitive or non-linear predictors (eg, BMI, dyslipidemia). This integrative approach supports both baseline prognostication and longitudinal survivorship strategies. AUC, area under the curve; BMI, body mass index; CV, cardiovascular; ECOG-PS, Eastern Cooperative Oncology Group Performance Status; NAIM, Not Another Imputation Method; NSCLC, non-small cell lung cancer; SHAP, SHapley Additive exPlanations; TMB, Tumor Mutational Burden.
Figure 2
Figure 2. Histogram plot summarizing the cumulative SHAP values from the NAIM analysis for the risk of death. The length of each bar represents the SHAP value, indicating the relative importance of each variable within the model. Features were ordered by their absolute contribution. The c-index (% ±SD) was 79.76±2.44 for the training set and 62.98±2.11 for the overall model. Variable’s definition and categorization details are reported in online supplemental methods. ALK, anaplastic lymphoma kinase; BMI, body mass index; CNS, central nervous system; ECOG-PS, Eastern Cooperative Oncology Group Performance Status; EGFR, epidermal growth factor receptor; GLM, glucose- lowering medications; NAIM, Not Another Imputation Method; NOS, not otherwise specified; PPI, proton pump inhibitors; pred, Prednisone; SHAP, SHapley Additive exPlanations; TMB, tumor mutational burden; TPS, Tumor Proportion Score.
Figure 3
Figure 3. Paired dot plot and histogram and plot summarizing the SHAP values from the NAIM analysis for the 5-year survival. Features were ordered by their absolute contribution, with high values (red) and low values (blue) positioned to indicate their influence on outcomes. For instance, red dots on the right side of the plot imply a positive association with the probability of being alive at 5 years, while blue dots on the right side imply a negative association. Missing values were represented as gray dots. The length of each bar represents the SHAP value, indicating the relative importance of each variable within the model. The metrics for the model (% ±SD) were as follows: the training set achieved an AUC of 78.53±3.25, an accuracy of 74.41±2.56, an F1-score of 71.38±2.82, an MCC of 47.34±3.41, and a G-Mean of 64.89±3.17. In the evaluation set, the model demonstrated an AUC of 60.52±3.71, an accuracy of 53.65±2.97, an F1-score of 45.43±23.10, an MCC of 10.72±8.35, and a G-Mean of 36.71±7.61. Variables definition and categorization details are reported in online supplemental methods. ALK, anaplastic lymphoma kinase; AUC, area under the curve; BMI, body mass index; CNS, central nervous system; ECOG-PS, Eastern Cooperative Oncology Group Performance Status; EGFR, epidermal growth factor receptor; GLM, glucose-lowering medications; NAIM, not another imputation method; NOS, not otherwise specified; PPI, proton pump inhibitors; pred, Prednisone; Matthews correlation coefficient: MCC; SHAP, SHapley Additive exPlanations; TMB, tumor mutational burden; TPS, Tumor Proportion Score.

References

    1. Remon J, Hendriks LEL, Besse B. Paving the Way for Long-Term Survival in Non-Small-Cell Lung Cancer. J Clin Oncol. 2021;39:2321–3. doi: 10.1200/JCO.21.00760. - DOI - PubMed
    1. de Jager VD, Timens W, Bayle A, et al. Developments in predictive biomarker testing and targeted therapy in advanced stage non-small cell lung cancer and their application across European countries. Lancet Reg Health Eur . 2024;38:100838. doi: 10.1016/j.lanepe.2024.100838. - DOI - PMC - PubMed
    1. Garassino MC, Gadgeel S, Speranza G, et al. Pembrolizumab Plus Pemetrexed and Platinum in Nonsquamous Non-Small-Cell Lung Cancer: 5-Year Outcomes From the Phase 3 KEYNOTE-189 Study. J Clin Oncol. 2023;41:1992–8. doi: 10.1200/JCO.22.01989. - DOI - PMC - PubMed
    1. Novello S, Kowalski DM, Luft A, et al. Pembrolizumab Plus Chemotherapy in Squamous Non-Small-Cell Lung Cancer: 5-Year Update of the Phase III KEYNOTE-407 Study. J Clin Oncol. 2023;41:1999–2006. doi: 10.1200/JCO.22.01990. - DOI - PMC - PubMed
    1. Reck M, Ciuleanu T-E, Schenker M, et al. Five-year outcomes with first-line nivolumab plus ipilimumab with 2 cycles of chemotherapy versus 4 cycles of chemotherapy alone in patients with metastatic non-small cell lung cancer in the randomized CheckMate 9LA trial. Eur J Cancer. 2024;211:114296. doi: 10.1016/j.ejca.2024.114296. - DOI - PubMed

MeSH terms