Consistency of Random Survival Forests

Hemant Ishwaran¹, Udaya B Kogalur

Affiliations

PMID: 20582150
PMCID: PMC2889677
DOI: 10.1016/j.spl.2010.02.020

Consistency of Random Survival Forests

Hemant Ishwaran et al. Stat Probab Lett. 2010.

. 2010 Jul 1;80(13-14):1056-1064.

doi: 10.1016/j.spl.2010.02.020.

Authors

Hemant Ishwaran¹, Udaya B Kogalur

Affiliation

¹ Cleveland Clinic.

PMID: 20582150
PMCID: PMC2889677
DOI: 10.1016/j.spl.2010.02.020

Abstract

We prove uniform consistency of Random Survival Forests (RSF), a newly introduced forest ensemble learner for analysis of right-censored survival data. Consistency is proven under general splitting rules, bootstrapping, and random selection of variables-that is, under true implementation of the methodology. Under this setting we show that the forest ensemble survival function converges uniformly to the true population survival function. To prove this result we make one key assumption regarding the feature space: we assume that all variables are factors. Doing so ensures that the feature space has finite cardinality and enables us to exploit counting process theory and the uniform consistency of the Kaplan-Meier survival function.

PubMed Disclaimer

Figures

**Figure 1**
RSF analysis of esophageal data. Five-year predicted survival for node-positive patients is plotted against number of cancer-postitive nodes, stratified by depth of invasion (T1, T2, T3, and T4). Predicted survival is based on the forest comprising the first 5, 10, 50, and 250 trees, respectively.

**Figure 2**
RSF analysis of PBC data using 1000 trees with random log-rank splitting where variables, both nominal and continuous, were discretized to have a maximum number of labels (factor granularity). Top figure is out-of-bag prediction error versus factor granularity, stratified by number of random splits used for a node, nsplit. Bottom figure shows 68% bootstrap confidence region for variable importance (VIMP) from 1000 bootstrap samples using an nsplit value of 1024 for each factor granularity value in the top figure. Color coding is such that the same color has been used for a variable over the different granularity values (factor granularity for a variable increases going from top to bottom).

See this image and copyright information in PMC

References

1. Amit Y, Geman D. Shape quantization and recognition with randomized trees. Neural Computation. 1997;9:1545–1588.
1. Andersen PK, Borgan O, Gill RD, Keiding N. Statistical Methods Based on Counting Processes. New York: Springer; 1993.
1. Biau G, Devroye L, Lugosi G. Consistency of random forests and other classifiers. J. Machine Learning Research. 2008;9:2039–2057.
1. Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. Belmont, California: 1984.
1. Breiman L. Bagging predictors. Machine Learning. 1996;26:123–140.

Grants and funding

UL1 RR024989/RR/NCRR NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Consistency of Random Survival Forests

Affiliation

Consistency of Random Survival Forests

Authors

Affiliation

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources