Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2010 Feb;19(1):29-51.
doi: 10.1177/0962280209105024. Epub 2009 Aug 4.

Survival analysis with high-dimensional covariates

Affiliations
Review

Survival analysis with high-dimensional covariates

Daniela M Witten et al. Stat Methods Med Res. 2010 Feb.

Erratum in

  • Stat Methods Med Res. 2010 Apr;19(2):200

Abstract

In recent years, breakthroughs in biomedical technology have led to a wealth of data in which the number of features (for instance, genes on which expression measurements are available) exceeds the number of observations (e.g. patients). Sometimes survival outcomes are also available for those same observations. In this case, one might be interested in (a) identifying features that are associated with survival (in a univariate sense), and (b) developing a multivariate model for the relationship between the features and survival that can be used to predict survival in a new observation. Due to the high dimensionality of this data, most classical statistical methods for survival analysis cannot be applied directly. Here, we review a number of methods from the literature that address these two problems.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Estimated FDR for Cox scores, modified Cox scores, and LPC scores are shown for the renal cell carcinoma data set of Zhao et al.
Figure 2
Figure 2
Hierarchical clustering of the patients is shown on the left. Kaplan–Meier survival curves for the two largest subgroups defined by hierarchical clustering are shown on the right; the p-value for the log-rank test is 0.0102.
Figure 3
Figure 3
For the Zhao et al. data, predictors obtained via PC regression and SPC on the training set were used to define three subgroups on the test set. For these subgroups, Kaplan–Meier survival curves and p-values for the log rank test statistic are shown.
Figure 4
Figure 4
For the Zhao et al. data, the y-axes show the average value of 2(l(Xtest β̂train, ytest, δtest)− l(0, ytest, δtest)) across cross-validation folds; a large value indicates a good fit on independent data. The notation l(γ, y, δ) indicates the log partial likelihood of the Cox model with outcome (y, δ) and predictor γ. Scout(2, 1) is not shown in this figure because it involves two tuning parameters.

Similar articles

Cited by

References

    1. Zhao H, Tibshirani R, Brooks J. Gene expression profiling predicts survival in conventional renal cell carcinoma. PLOS Medicine. 2006;3:e13. - PMC - PubMed
    1. Perou C, Jeffrey S, van de Rijn M, et al. Distinctive gene expression patterns in human mammary epiphelial cells and breast cancers. Proceedings of the National Academy of Sciences. 1999;96:9212–9217. - PMC - PubMed
    1. Golub T, Slonim D, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–536. - PubMed
    1. Sorlie T, Perou C, Tibshirani R, et al. Gene expression patterns of breast carcinomas distinguish tumour subclasses with clinical implications. Proceedings of the National Academy of Sciences. 2001;98:10969–10974. - PMC - PubMed
    1. Hedenfalk I, Duggan D, Chen Y, Radmacher M, Bittner M, Simon R, et al. Gene-expression profiles in hereditary breast cancer. The New England Journal of Medicine. 2001;344:539–548. - PubMed

MeSH terms