Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Feb 8:11:78.
doi: 10.1186/1471-2105-11-78.

Testing the additional predictive value of high-dimensional molecular data

Affiliations

Testing the additional predictive value of high-dimensional molecular data

Anne-Laure Boulesteix et al. BMC Bioinformatics. .

Abstract

Background: While high-dimensional molecular data such as microarray gene expression data have been used for disease outcome prediction or diagnosis purposes for about ten years in biomedical research, the question of the additional predictive value of such data given that classical predictors are already available has long been under-considered in the bioinformatics literature.

Results: We suggest an intuitive permutation-based testing procedure for assessing the additional predictive value of high-dimensional molecular data. Our method combines two well-known statistical tools: logistic regression and boosting regression. We give clear advice for the choice of the only method parameter (the number of boosting iterations). In simulations, our novel approach is found to have very good power in different settings, e.g. few strong predictors or many weak predictors. For illustrative purpose, it is applied to the two publicly available cancer data sets.

Conclusions: Our simple and computationally efficient approach can be used to globally assess the additional predictive power of a large number of candidate predictors given that a few clinical covariates or a known prognostic index are already available. It is implemented in the R package "globalboosttest" which is publicly available from R-forge and will be sent to the CRAN as soon as possible.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Choice of mstop. Negative log-likelihood for the original data (red) and the permuted data (black) against the number of iterations mstop. (a) μX = 5, p* = 1. (b) μX = 0.2, p* = 200.
Figure 2
Figure 2
Boxplots of p-values. Boxplots of the p-values for the eight settings described in the Section 'Simulation design' using our new method with mstop = 100, 500, 1000 and AIC-optimized mstop (grey boxes) and using Goeman's global test (white boxes) for comparison.
Figure 3
Figure 3
Negative binomial log-likelihood in the real data study. Negative binomial log-likelihood as a function of mstop for the original data sets (black) and for the permuted data sets (grey).

References

    1. Golub T, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing J, Caligiuri MA, Bloomfield CD, Lander ES. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–537. doi: 10.1126/science.286.5439.531. - DOI - PubMed
    1. Eden P, Ritz C, Rose C, Fernö M, Peterson C. "Good old" clinical markers have similar power in breast cancer prognosis as microarray gene expression profilers. European Journal of Cancer. 2004;40:1837–1841. doi: 10.1016/j.ejca.2004.02.025. - DOI - PubMed
    1. Truntzer C, Maucort-Boulch D, Roy P. Comparative optimism in models involving both classical clinical and gene expression information. BMC Bioinformatics. 2008;9:434. doi: 10.1186/1471-2105-9-434. - DOI - PMC - PubMed
    1. Tibshirani R, Efron B. Pre-validation and inference in microarrays. Statistical Applications in Genetics and Molecular Biology. 2002;1:1. doi: 10.2202/1544-6115.1000. - DOI - PubMed
    1. Höing H, Tibshirani R. A study of pre-validation. Annals of Applied Statistics. 2008;2:643–664.

Publication types

MeSH terms

LinkOut - more resources