Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 3;25(1):175.
doi: 10.1186/s12859-024-05799-2.

A transformer model for cause-specific hazard prediction

Affiliations

A transformer model for cause-specific hazard prediction

Matthieu Oliver et al. BMC Bioinformatics. .

Abstract

Backgroud: Modelling discrete-time cause-specific hazards in the presence of competing events and non-proportional hazards is a challenging task in many domains. Survival analysis in longitudinal cohorts often requires such models; notably when the data is gathered at discrete points in time and the predicted events display complex dynamics. Current models often rely on strong assumptions of proportional hazards, that is rarely verified in practice; or do not handle sequential data in a meaningful way. This study proposes a Transformer architecture for the prediction of cause-specific hazards in discrete-time competing risks. Contrary to Multilayer perceptrons that were already used for this task (DeepHit), the Transformer architecture is especially suited for handling complex relationships in sequential data, having displayed state-of-the-art performance in numerous tasks with few underlying assumptions on the task at hand.

Results: Using synthetic datasets of 2000-50,000 patients, we showed that our Transformer model surpassed the CoxPH, PyDTS, and DeepHit models for the prediction of cause-specific hazard, especially when the proportional assumption did not hold. The error along simulated time outlined the ability of our model to anticipate the evolution of cause-specific hazards at later time steps where few events are observed. It was also superior to current models for prediction of dementia and other psychiatric conditions in the English longitudinal study of ageing cohort using the integrated brier score and the time-dependent concordance index. We also displayed the explainability of our model's prediction using the integrated gradients method.

Conclusions: Our model provided state-of-the-art prediction of cause-specific hazards, without adopting prior parametric assumptions on the hazard rates. It outperformed other models in non-proportional hazards settings for both the synthetic dataset and the longitudinal cohort study. We also observed that basic models such as CoxPH were more suited to extremely simple settings than deep learning models. Our model is therefore especially suited for survival analysis on longitudinal cohorts with complex dynamics of the covariate-to-outcome relationship, which are common in clinical practice. The integrated gradients provided the importance scores of input variables, which indicated variables guiding the model in its prediction. This model is ready to be utilized for time-to-event prediction in longitudinal cohorts.

Keywords: Cause-specific hazard; Competing risks; English longitudinal study of ageing; Synthetic data; Transformer.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Architecture of our transformer-based model. Each part of the architecture is described in detail in “Appendix 1”
Fig. 2
Fig. 2
Description this study’s data. a and b respectively illustrate underlying cause-specific hazards and the cumulative incidence of each simulated event. c illustrates the cumulative incidence function of events in the ELSA cohort
Fig. 3
Fig. 3
Time-dependance of the models’ performance. Performance was computed using the mean absolute error for the prediction of the cause-specific hazard for each simulated event. The Transformer model surpassed other models by a large margin on non-proportional hazard events, thanks especially to a major performance gap on the second half of the simulated time. It was also better than the DeepHit model at every single time step. This error was computed with each models being trained on a dataset of 10,000 simulated patients
Fig. 4
Fig. 4
Seven most important features obtained from the mean integrated gradients from the Deephit (a) and Transformer (b) model using the ELSA dataset
Fig. 5
Fig. 5
Cause-specific hazard predictions on two patients from the synthetic dataset. The ground truth and predicted hazard are presented for each of the Proportional, Increasing, and Non-Monotonic hazard events. For readability, the PyDTS and RCoxPH models are presented on the top row and the DeepHit and Transformer models on the bottom row

References

    1. Routh P, Roy A, Meyer J. Estimating customer churn under competing risks. J Oper Res Soc. 2020;72(1–18):08.
    1. Wycinka E. Competing risk models of default in the presence of early repayments. Econometrics. 2019;23:06. doi: 10.15611/eada.2019.2.07. - DOI
    1. Cope S, Jansen J. Quantitative summaries of treatment effect estimates obtained with network meta-analysis of survival curves to inform decision-making. BMC Med Res Methodol. 2013;13(147):12. - PMC - PubMed
    1. Lee M, Feuer EJ, Fine JP. On the analysis of discrete time competing risks data. Biometrics. 2018;74(4):1468–1481. doi: 10.1111/biom.12881. - DOI - PubMed
    1. Cox DR. Regression models and life-tables. J R Stat Soc Ser B (Methodol) 1972;34(2):187–202. doi: 10.1111/j.2517-6161.1972.tb00899.x. - DOI

Publication types

LinkOut - more resources