Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jul 1;80(13-14):1056-1064.
doi: 10.1016/j.spl.2010.02.020.

Consistency of Random Survival Forests

Affiliations

Consistency of Random Survival Forests

Hemant Ishwaran et al. Stat Probab Lett. .

Abstract

We prove uniform consistency of Random Survival Forests (RSF), a newly introduced forest ensemble learner for analysis of right-censored survival data. Consistency is proven under general splitting rules, bootstrapping, and random selection of variables-that is, under true implementation of the methodology. Under this setting we show that the forest ensemble survival function converges uniformly to the true population survival function. To prove this result we make one key assumption regarding the feature space: we assume that all variables are factors. Doing so ensures that the feature space has finite cardinality and enables us to exploit counting process theory and the uniform consistency of the Kaplan-Meier survival function.

PubMed Disclaimer

Figures

Figure 1
Figure 1
RSF analysis of esophageal data. Five-year predicted survival for node-positive patients is plotted against number of cancer-postitive nodes, stratified by depth of invasion (T1, T2, T3, and T4). Predicted survival is based on the forest comprising the first 5, 10, 50, and 250 trees, respectively.
Figure 2
Figure 2
RSF analysis of PBC data using 1000 trees with random log-rank splitting where variables, both nominal and continuous, were discretized to have a maximum number of labels (factor granularity). Top figure is out-of-bag prediction error versus factor granularity, stratified by number of random splits used for a node, nsplit. Bottom figure shows 68% bootstrap confidence region for variable importance (VIMP) from 1000 bootstrap samples using an nsplit value of 1024 for each factor granularity value in the top figure. Color coding is such that the same color has been used for a variable over the different granularity values (factor granularity for a variable increases going from top to bottom).

References

    1. Amit Y, Geman D. Shape quantization and recognition with randomized trees. Neural Computation. 1997;9:1545–1588.
    1. Andersen PK, Borgan O, Gill RD, Keiding N. Statistical Methods Based on Counting Processes. New York: Springer; 1993.
    1. Biau G, Devroye L, Lugosi G. Consistency of random forests and other classifiers. J. Machine Learning Research. 2008;9:2039–2057.
    1. Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. Belmont, California: 1984.
    1. Breiman L. Bagging predictors. Machine Learning. 1996;26:123–140.

LinkOut - more resources