. 2021 Jan 7:11:587378.

doi: 10.3389/fgene.2020.587378. eCollection 2020.

Estimation of Heterogeneous Restricted Mean Survival Time Using Random Forest

Mingyang Liu¹, Hongzhe Li¹

Affiliations

PMID: 33584791
PMCID: PMC7873855
DOI: 10.3389/fgene.2020.587378

Estimation of Heterogeneous Restricted Mean Survival Time Using Random Forest

Mingyang Liu et al. Front Genet. 2021.

. 2021 Jan 7:11:587378.

doi: 10.3389/fgene.2020.587378. eCollection 2020.

Authors

Mingyang Liu¹, Hongzhe Li¹

Affiliation

¹ Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.

PMID: 33584791
PMCID: PMC7873855
DOI: 10.3389/fgene.2020.587378

Abstract

Estimation and prediction of heterogeneous restricted mean survival time (hRMST) is of great clinical importance, which can provide an easily interpretable and clinically meaningful summary of the survival function in the presence of censoring and individual covariates. The existing methods for the modeling of hRMST rely on proportional hazards or other parametric assumptions on the survival distribution. In this paper, we propose a random forest based estimation of hRMST for right-censored survival data with covariates and prove a central limit theorem for the resulting estimator. In addition, we present a computationally efficient construction for the confidence interval of hRMST. Our simulations show that the resulting confidence intervals have the correct coverage probability of the hRMST, and the random forest based estimate of hRMST has smaller prediction errors than the parametric models when the models are mis-specified. We apply the method to the ovarian cancer data set from The Cancer Genome Atlas (TCGA) project to predict hRMST and show an improved prediction performance over the existing methods. A software implementation, srf using R and C++, is available at https://github.com/lmy1019/SRF.

Keywords: estimating equation; high dimensional data; inference; non-parametric survival estimation; regression forest.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Figure 1**
Training data are simulated from Equation (2), with n = 600 training points, dimension p = 20 and errors ϵ ~ N(0, 10²). Random forests are trained based using R package grf. Truth is shown as red curve, with green curve corresponding to the random forest predictions, and upper and lower bounds of the point-wise confidence intervals connected in the black lines. Brown curve and blue curve are based on the approaches of Wang and Schaubel (2018) with Identity and Exp link functions.

**Figure 2**
Simulation results of the coverage probability for Model 1 with three different link functions, sample size of n = 1, 000, 2, 000, 5, 000, and p = 2, 4, 6, 8. For each case, prediction coverage probability is calculated over the samples in the testing data set.

**Figure 3**
Simulation results of coverage probability for Model 2 with three different link functions, sample size of n = 1, 000, 2, 000, 10, 000, and p = 2, 4, 6, 8. For each case, prediction coverage probability is calculated over the samples in the testing data set.

**Figure 4**
Estimated vs. the true RMST for Model 1 **(left)** and Model 2 **(right)** with exponential link function and the number of covariates p = 5, 10, 20 **(top–bottom)**. SRF, proposed random forest-bases estimator, and upper and lower bounds of the point-wise confidence intervals of the proposed random forest estimator are connected in the gray lines; Naive.km, estimate based on Kaplan–Meier estimator without adjusting for the covariates; Naive.Cox, Cox regression based estimator; Lu.id, method of Tian et al. (2014) with identity link; Lu.exp, method of Tian et al. (2014) with exponential link; Wang.id, method of Wang and Schaubel (2018) with identity link; Wang:exp, method of Wang and Schaubel (2018) with exponential link.

**Figure 5**
Performance of the proposed random forest estimator compared with other methods for L = 3, 4, 5. The left penal is the MAE across of 10-fold cross-validation. The right panel is the RMSE across of 10-fold cross-validation. SRF, proposed random forest estimator; Naive.km, estimate based on Kaplan–Meier estimator without adjusting for the covariates; Naive.Cox, Cox regression based estimator; Lu.id, method of Tian et al. (2014) with identity link; Lu.exp, method of Tian et al. (2014) with exponential link; Wang.id method of Wang and Schaubel (2018) with identity link; Wang:exp, method of Wang and Schaubel (2018) with exponential link.

See this image and copyright information in PMC

Cited by

Improve individual treatment by comparing treatment benefits: cancer artificial intelligence survival analysis system for cervical carcinoma.
Liang J, He T, Li H, Guo X, Zhang Z. Liang J, et al. J Transl Med. 2022 Jun 28;20(1):293. doi: 10.1186/s12967-022-03491-8. J Transl Med. 2022. PMID: 35765031 Free PMC article.
Differences of survival benefits brought by various treatments in ovarian cancer patients with different tumor stages.
He T, Li H, Zhang Z. He T, et al. J Ovarian Res. 2023 May 11;16(1):92. doi: 10.1186/s13048-023-01173-7. J Ovarian Res. 2023. PMID: 37170143 Free PMC article.

References

1. Akbani R., Ng P. K. S., Werner H. M., Shahmoradgoli M., Zhang F., Ju Z., et al. . (2015). Corrigendum: a pan-cancer proteomic perspective on the Cancer Genome Atlas. Nat. Commun. 6:5852. 10.1038/ncomms5852 - DOI - PubMed
1. Andersen P. K., Gill R. D. (1982). Cox's regression model for counting processes: a large sample study. Ann. Stat. 10, 1100–1120. 10.1214/aos/1176345976 - DOI
1. Athey S., Tibshirani J., Wager S. (2018). Generalized Random Forests. Technical report. Stanford, CA: Stanford University.
1. Biau G. (2012). Analysis of a random forests model. J. Mach. Learn. Res. 13, 1063–1095.
1. Biau G., Devroye L., Lugosi G. (2008). Consistency of random forests and other averaging classifiers. J. Mach. Learn. Res. 9, 2015–2033.

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Estimation of Heterogeneous Restricted Mean Survival Time Using Random Forest

Affiliation

Estimation of Heterogeneous Restricted Mean Survival Time Using Random Forest

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous