A new method for clustered survival data: Estimation of treatment effect heterogeneity and variable selection
- PMID: 38072661
- PMCID: PMC10953775
- DOI: 10.1002/bimj.202200178
A new method for clustered survival data: Estimation of treatment effect heterogeneity and variable selection
Abstract
We recently developed a new method random-intercept accelerated failure time model with Bayesian additive regression trees (riAFT-BART) to draw causal inferences about population treatment effect on patient survival from clustered and censored survival data while accounting for the multilevel data structure. The practical utility of this method goes beyond the estimation of population average treatment effect. In this work, we exposit how riAFT-BART can be used to solve two important statistical questions with clustered survival data: estimating the treatment effect heterogeneity and variable selection. Leveraging the likelihood-based machine learning, we describe a way in which we can draw posterior samples of the individual survival treatment effect from riAFT-BART model runs, and use the drawn posterior samples to perform an exploratory treatment effect heterogeneity analysis to identify subpopulations who may experience differential treatment effects than population average effects. There is sparse literature on methods for variable selection among clustered and censored survival data, particularly ones using flexible modeling techniques. We propose a permutation-based approach using the predictor's variable inclusion proportion supplied by the riAFT-BART model for variable selection. To address the missing data issue frequently encountered in health databases, we propose a strategy to combine bootstrap imputation and riAFT-BART for variable selection among incomplete clustered survival data. We conduct an expansive simulation study to examine the practical operating characteristics of our proposed methods, and provide empirical evidence that our proposed methods perform better than several existing methods across a wide range of data scenarios. Finally, we demonstrate the methods via a case study of predictors for in-hospital mortality among severe COVID-19 patients and estimating the heterogeneous treatment effects of three COVID-specific medications. The methods developed in this work are readily available in the package .
Keywords: Bayesian machine learning; clustered survival observations; treatment effect heterogeneity; variable importance.
© 2023 The Authors. Biometrical Journal published by Wiley-VCH GmbH.
Conflict of interest statement
The author declares no conflicts of interest.
Figures





Similar articles
-
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23. Clin Orthop Relat Res. 2024. PMID: 39051924
-
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3. Cochrane Database Syst Rev. 2022. PMID: 35593186 Free PMC article.
-
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3. Syst Rev. 2024. PMID: 39593159 Free PMC article.
-
A New Measure of Quantified Social Health Is Associated With Levels of Discomfort, Capability, and Mental and General Health Among Patients Seeking Musculoskeletal Specialty Care.Clin Orthop Relat Res. 2025 Apr 1;483(4):647-663. doi: 10.1097/CORR.0000000000003394. Epub 2025 Feb 5. Clin Orthop Relat Res. 2025. PMID: 39915110
-
Quality improvement strategies for diabetes care: Effects on outcomes for adults living with diabetes.Cochrane Database Syst Rev. 2023 May 31;5(5):CD014513. doi: 10.1002/14651858.CD014513. Cochrane Database Syst Rev. 2023. PMID: 37254718 Free PMC article.
Cited by
-
Using Tree-Based Machine Learning for Health Studies: Literature Review and Case Series.Int J Environ Res Public Health. 2022 Dec 1;19(23):16080. doi: 10.3390/ijerph192316080. Int J Environ Res Public Health. 2022. PMID: 36498153 Free PMC article. Review.
-
A Flexible Approach for Assessing Heterogeneity of Causal Treatment Effects on Patient Survival Using Large Datasets with Clustered Observations.Int J Environ Res Public Health. 2022 Nov 12;19(22):14903. doi: 10.3390/ijerph192214903. Int J Environ Res Public Health. 2022. PMID: 36429621 Free PMC article.
References
-
- Androulakis E, Koukouvinos C, and Vonta F (2012). Estimation and variable selection via frailty models with penalized likelihood. Statistics in Medicine 31, 2223–2239. - PubMed
-
- Arpino B and Cannas M (2016). Propensity score matching with clustered data. an application to the estimation of the impact of caesarean section on the apgar score. Statistics in Medicine 35, 2074–2091. - PubMed
-
- Bender A, Groll A, and Scheipl F (2018). A generalized additive model approach to time-to-event analysis. Statistical Modelling 18, 299–321.
-
- Bleich J, Kapelner A, George EI, and Jensen ST (2014). Variable selection for BART: an application to gene regulation. The Annals of Applied Statistics 8, 1750–1781.
-
- Breiman L. (2001). Random forests. Machine Learning 45, 5–32.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources