Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan;66(1):e2200178.
doi: 10.1002/bimj.202200178. Epub 2023 Dec 10.

A new method for clustered survival data: Estimation of treatment effect heterogeneity and variable selection

Affiliations

A new method for clustered survival data: Estimation of treatment effect heterogeneity and variable selection

Liangyuan Hu. Biom J. 2024 Jan.

Abstract

We recently developed a new method random-intercept accelerated failure time model with Bayesian additive regression trees (riAFT-BART) to draw causal inferences about population treatment effect on patient survival from clustered and censored survival data while accounting for the multilevel data structure. The practical utility of this method goes beyond the estimation of population average treatment effect. In this work, we exposit how riAFT-BART can be used to solve two important statistical questions with clustered survival data: estimating the treatment effect heterogeneity and variable selection. Leveraging the likelihood-based machine learning, we describe a way in which we can draw posterior samples of the individual survival treatment effect from riAFT-BART model runs, and use the drawn posterior samples to perform an exploratory treatment effect heterogeneity analysis to identify subpopulations who may experience differential treatment effects than population average effects. There is sparse literature on methods for variable selection among clustered and censored survival data, particularly ones using flexible modeling techniques. We propose a permutation-based approach using the predictor's variable inclusion proportion supplied by the riAFT-BART model for variable selection. To address the missing data issue frequently encountered in health databases, we propose a strategy to combine bootstrap imputation and riAFT-BART for variable selection among incomplete clustered survival data. We conduct an expansive simulation study to examine the practical operating characteristics of our proposed methods, and provide empirical evidence that our proposed methods perform better than several existing methods across a wide range of data scenarios. Finally, we demonstrate the methods via a case study of predictors for in-hospital mortality among severe COVID-19 patients and estimating the heterogeneous treatment effects of three COVID-specific medications. The methods developed in this work are readily available in the R ${\textsf {R}}$ package riAFTBART $\textsf {riAFTBART}$ .

Keywords: Bayesian machine learning; clustered survival observations; treatment effect heterogeneity; variable importance.

PubMed Disclaimer

Conflict of interest statement

The author declares no conflicts of interest.

Figures

Figure 1
Figure 1
Relative biases (Panel A) and root-mean-squared-errors (RMSE) (Panel B) among 40 generalized propensity score subgroups under 6 data configurations: (heterogeneity settings a, b, c) × (proportional hazards (PH) and nonproportional hazards (nPH)) for each of four methods, IPW-riCox, DR-riAH, PEAMM and riAFT-BART. Three pairwise treatment effects were estimated by averaging the individual survival treatment effect (based on 3-week survival probability) across individuals in each subgroup. Each boxplot visualizes the distribution of relative biases or the distribution of RMSE for 40 subgroups, each averaged across 250 simulation runs.
Figure 2
Figure 2
The distribution, across 250 data replications, of the numbers of selected noise predictors and useful predictors for each of five methods: riAFT-BART, PEAMM, FrailtyHL, FrailtyPenal and riCox, with clustered survival data generated under both proportional hazards (PH) and non-proportional hazards (nPH). The total number of useful predictors is 8 and the total number of noise predictors is 20. There are K = 10 clusters, each with a size of 200; the total sample size is 2000. The overall proportion of missingness is 40%.
Figure 3
Figure 3
Power of each of five methods: riAFT-BART, PEAMM, FrailtyHL, FrailtyPenal and riCox, for selecting each of 8 useful predictors with clustered survival data generated under proportional hazards (PH) and non-proportional hazards (nPH), based on 250 data replications. There are K = 10 clusters, each with a size of 200; the total sample size is 2000. The overall proportion of missingness is 40%. Filled symbols represent the PH setting, and open symbols correspond to the nPH setting.
Figure 4
Figure 4
The distribution of cross-validated concordance statistics across 250 data replications for each of five methods using the COVID-19 dataset.
Figure 5
Figure 5
Final Random Forests model fit to the posterior mean of the individual survival treatment effect comparing remdesivir and dexamethasone + remdesivir. Values in each node correspond to the posterior mean, in terms of difference in log survival days, for the subgroup of individuals represented in that node. Uncertainty intervals were obtained by pooling the posterior samples arising from the multiple imputed data sets. WBC: White blood cell.

Similar articles

Cited by

References

    1. Androulakis E, Koukouvinos C, and Vonta F (2012). Estimation and variable selection via frailty models with penalized likelihood. Statistics in Medicine 31, 2223–2239. - PubMed
    1. Arpino B and Cannas M (2016). Propensity score matching with clustered data. an application to the estimation of the impact of caesarean section on the apgar score. Statistics in Medicine 35, 2074–2091. - PubMed
    1. Bender A, Groll A, and Scheipl F (2018). A generalized additive model approach to time-to-event analysis. Statistical Modelling 18, 299–321.
    1. Bleich J, Kapelner A, George EI, and Jensen ST (2014). Variable selection for BART: an application to gene regulation. The Annals of Applied Statistics 8, 1750–1781.
    1. Breiman L. (2001). Random forests. Machine Learning 45, 5–32.

Publication types

LinkOut - more resources