Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Oct;7(10):e1002240.
doi: 10.1371/journal.pcbi.1002240. Epub 2011 Oct 20.

Most random gene expression signatures are significantly associated with breast cancer outcome

Affiliations

Most random gene expression signatures are significantly associated with breast cancer outcome

David Venet et al. PLoS Comput Biol. 2011 Oct.

Abstract

Bridging the gap between animal or in vitro models and human disease is essential in medical research. Researchers often suggest that a biological mechanism is relevant to human cancer from the statistical association of a gene expression marker (a signature) of this mechanism, that was discovered in an experimental system, with disease outcome in humans. We examined this argument for breast cancer. Surprisingly, we found that gene expression signatures-unrelated to cancer-of the effect of postprandial laughter, of mice social defeat and of skin fibroblast localization were all significantly associated with breast cancer outcome. We next compared 47 published breast cancer outcome signatures to signatures made of random genes. Twenty-eight of them (60%) were not significantly better outcome predictors than random signatures of identical size and 11 (23%) were worst predictors than the median random signature. More than 90% of random signatures >100 genes were significant outcome predictors. We next derived a metagene, called meta-PCNA, by selecting the 1% genes most positively correlated with proliferation marker PCNA in a compendium of normal tissues expression. Adjusting breast cancer expression data for meta-PCNA abrogated almost entirely the outcome association of published and random signatures. We also found that, in the absence of adjustment, the hazard ratio of outcome association of a signature strongly correlated with meta-PCNA (R(2) = 0.9). This relation also applied to single-gene expression markers. Moreover, >50% of the breast cancer transcriptome was correlated with meta-PCNA. A corollary was that purging cell cycle genes out of a signature failed to rule out the confounding effect of proliferation. Hence, it is questionable to suggest that a mechanism is relevant to human breast cancer from the finding that a gene expression marker for this mechanism predicts human breast cancer outcome, because most markers do. The methods we present help to overcome this problem.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Association of negative control signatures with overall survival.
In plots A–C the NKI cohort was split into two groups using a signature of post-prandial laughter (panel A), localization of skin fibroblasts (panel B), social defeat in mice (panel C). In panels A–C, the fraction of patients alive (overall survival, OS) is shown as a function of time for both groups. Hazard ratios (HR) between groups and their associated p-values are given in bottom-left corners. Panel D depicts p-values for association with outcome for all MSigDB c2 signatures and random signatures of identical size as MSigDB c2 signatures.
Figure 2
Figure 2. Most published signatures are not significantly better outcome predictors than random signatures of identical size.
The x-axis denotes the p-value of association with overall survival. Red dots stand for published signatures, yellow shapes depict the distribution of p-values for 1000 random signatures of identical size, with the lower 5% quantiles shaded in green and the median shown as black line. Signatures are ordered by increasing sizes.
Figure 3
Figure 3. Meta-PCNA adjustment decreases the prognostic abilities of published signatures.
Hazard ratios for overall survival association of 48 signatures in the original dataset (blue) and the meta-PCNA-adjusted dataset (red). Box sizes are inversely related to the size of the confidence intervals. Related Kaplan-Meier plots are available in the Supporting Information (Text S1).
Figure 4
Figure 4. Most prognostic transcriptional signals are correlated with meta-PCNA.
A) Each point denotes a signature. The x-axis depicts the absolute value of the correlation of the first principal component of the signatures with meta-PCNA, the y-axis depicts the hazard ratio for outcome association. Details of the analysis for each data point are available in the Supporting Information (Text S1). B) Distribution of the correlations of individual genes with meta-PCNA, for genes significantly associated with overall survival (red) and for all the genes spotted on the microarrays (black).
Figure 5
Figure 5. Purging cell cycle genes from a signature does not rule out proliferation signals.
Distribution of the correlations with meta-PCNA of genes in the Embryonic Stem Cell Module (blue, ref. [15]), of the correlations of the same module with its cell cycle genes removed (red) and of all of the genes spotted on the microarray (black).
Figure 6
Figure 6. Reproducible outcome predictions across end-points and cohorts.
Each dot represents a published signature. A) Hazard ratios. B) Log rank p-values. Lower panels give correlation coefficients for corresponding scatter plots in the symmetric upper panels. OS, overall survival; RFS, recurrence-free survival. NKI, data from ref. ; LOI, data from ref. .

References

    1. Sotiriou C, Pusztai L. Gene-expression signatures in breast cancer. N Engl J Med. 2009;360:790–800. doi: 10.1056/NEJMra0801289. - DOI - PubMed
    1. van de Vijver MJ, He YD, van't Veer LJ, Dai H, Hart AAM, et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002;347:1999–2009. doi: 10.1056/NEJMoa021967. - DOI - PubMed
    1. Paik S, Shak S, Tang G, Kim C, Baker J, et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med. 2004;351:2817–2826. doi: 10.1056/NEJMoa041588. - DOI - PubMed
    1. Pawitan Y, Bjöhle J, Amler L, Borg A-L, Egyhazi S, et al. Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts. Breast Cancer Res. 2005;7:R953–964. doi: 10.1186/bcr1325. - DOI - PMC - PubMed
    1. Korkola JE, Blaveri E, DeVries S, Moore DH, Hwang ES, et al. Identification of a robust gene signature that predicts breast cancer outcome in independent data sets. BMC Cancer. 2007;7:61. doi: 10.1186/1471-2407-7-61. - DOI - PMC - PubMed

Publication types

Substances