Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Oct 15:9:434.
doi: 10.1186/1471-2105-9-434.

Comparative optimism in models involving both classical clinical and gene expression information

Affiliations

Comparative optimism in models involving both classical clinical and gene expression information

Caroline Truntzer et al. BMC Bioinformatics. .

Abstract

Background: In cancer research, most clinical variables have already been investigated and are now well established. The use of transcriptomic variables has raised two problems: restricting their number and validating their significance. Thus, their contribution to prognosis is currently thought to be overestimated. The aim of this study was to determine to what extent optimism concerning current transcriptomic models may lead to overestimation of the contribution of transcriptomic variables to survival prognosis.

Results: To achieve this goal, Cox proportional hazards models that adjust for clinical and transcriptomic variables were built. As the relevance of the clinical variables had already been established, they were not submitted to selection. As for genes, they were selected using both univariate and multivariate methods. Optimism and the contribution of clinical and transcriptomic variables to prognosis were compared through simulations and by using the Kent and O'Quigley rho2 measure of dependence. We showed that the optimism relative to clinical variables was low because these are no longer submitted to selection of relevant variables. In contrast, for genes, the selection process introduced high optimism, which increased when the proportion of genes of interest decreased. However, this optimism can be decreased by increasing the number of samples.

Conclusion: Two phenomena have to be taken into account by comparing the predictive power and optimism of clinical variables and those of genes: overestimation for genes due to the selection process and underestimation for clinical variables due to the omission of relevant genes. In comparison with genes, the predictive value of validated clinical variables is not overestimated, which should be kept in mind in future studies involving both clinical and transcriptomic variables.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Evolution of ΔTrTe with the sample size. Boxplots representative of the evolution of ΔTrTe with the sample size for the clinical variables (first panel), and the transcriptomic variables selected through the multivariate (second panel), and the univariate (third panel) way. p = 1000 genes.
Figure 2
Figure 2
Evolution of ΔTrPop with the sample size. Boxplots representative of the evolution of ΔTrPop with the sample size for the clinical variables (first panel) and the transcriptomic variables selected through the multivariate way (second panel). p = 1000 genes.
Figure 3
Figure 3
Evolution of ΔTePop with the sample size. Boxplots representative of the evolution of ΔTePop with the sample size for the clinical variables (first panel) and the transcriptomic variables selected through the multivariate way (second panel). p = 1000 genes.
Figure 4
Figure 4
Role of true positives. Boxplots representative of the evolution of ρTr2 (first panel) or ρ¯Te2 (second panel) with the sample size given that all selected genes or only true positives are taken into account. p = 1000 genes.
Figure 5
Figure 5
Evolution of ΔTrTe with the number of genes under study. Boxplots representative of the evolution of ΔTrTe with the number of genes under study for the transcriptomic variables selected through the multivariate (first panel) or the univariate way (second panel). n = 100 patients.

References

    1. Shipp M, Ross K, Tamayo P, Weng A. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature. 2002;8:68–74. - PubMed
    1. Dettling M, Bühlmann P. Finding predictive gene groups from microarray data. J Multivar Anal. 2004;90:106–131. doi: 10.1016/j.jmva.2004.02.012. - DOI
    1. Gevaert O, Smet FD, Timmerman D, Moreau Y, Moor BD. Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks. Bioinformatics. 2006;22:184–190. doi: 10.1093/bioinformatics/btl230. - DOI - PubMed
    1. Li L. Survival prediction of diffuse large-B-cell lymphoma based on both clinical and gene expression information. Bioinformatics. 2006;22:466–471. - PubMed
    1. Tibshirani R, Efron B. Pre-validation and inference in microarrays. Statistical Applications in Genetics and Molecular Biology. 2007;1:1. - PubMed

Publication types

LinkOut - more resources