Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;9(1):e1002875.
doi: 10.1371/journal.pcbi.1002875. Epub 2013 Jan 24.

Significance analysis of prognostic signatures

Affiliations

Significance analysis of prognostic signatures

Andrew H Beck et al. PLoS Comput Biol. 2013.

Abstract

A major goal in translational cancer research is to identify biological signatures driving cancer progression and metastasis. A common technique applied in genomics research is to cluster patients using gene expression data from a candidate prognostic gene set, and if the resulting clusters show statistically significant outcome stratification, to associate the gene set with prognosis, suggesting its biological and clinical importance. Recent work has questioned the validity of this approach by showing in several breast cancer data sets that "random" gene sets tend to cluster patients into prognostically variable subgroups. This work suggests that new rigorous statistical methods are needed to identify biologically informative prognostic gene sets. To address this problem, we developed Significance Analysis of Prognostic Signatures (SAPS) which integrates standard prognostic tests with a new prognostic significance test based on stratifying patients into prognostic subtypes with random gene sets. SAPS ensures that a significant gene set is not only able to stratify patients into prognostically variable groups, but is also enriched for genes showing strong univariate associations with patient prognosis, and performs significantly better than random gene sets. We use SAPS to perform a large meta-analysis (the largest completed to date) of prognostic pathways in breast and ovarian cancer and their molecular subtypes. Our analyses show that only a small subset of the gene sets found statistically significant using standard measures achieve significance by SAPS. We identify new prognostic signatures in breast and ovarian cancer and their corresponding molecular subtypes, and we show that prognostic signatures in ER negative breast cancer are more similar to prognostic signatures in ovarian cancer than to prognostic signatures in ER positive breast cancer. SAPS is a powerful new method for deriving robust prognostic biological signatures from clinically annotated genomic datasets.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Overview of SAPS method.
The SAPS method computes three P values for a candidate gene set (A–C). These P values are summarized in the SAPSscore (D) and statistical significance of a SAPSscore is estimated by permutation testing (E).
Figure 2
Figure 2. Global breast cancer Venn diagram and scatterplot.
(A) The gene sets significant by at least one of the P values at the 0.05 level are displayed in a Venn diagram. (B) The −log10 of the SAPSq-value is plotted on the y-axis and the SAPSscore along the x axis for each of the 5320 gene sets in the Molecular Signatures Database for their prognostic significance in breast cancer overall. Each point in the scatterplot represents a gene set, and gene sets that achieved a SAPSq-value≤0.05 and an absolute value (SAPSscore)≥1.3 are colored in red.
Figure 3
Figure 3. ER+/HER2− high proliferation Venn diagram and scatterplot.
(A) The gene sets significant by at least one of the P values at the 0.05 level are displayed in a Venn diagram. (B) The −log10 of the SAPSq-value is plotted on the y-axis and the SAPSscore along the x axis for each of the 5320 gene sets in the Molecular Signatures Database for their prognostic significance in the ER+/HER2− breast cancer molecular subtype. Each point in the scatterplot represents a gene set, and gene sets that achieved a SAPSq-value≤0.05 and an absolute value (SAPSscore)≥1.3 are colored in red.
Figure 4
Figure 4. ER+/HER2− low proliferation Venn diagram and scatterplot.
(A) The gene sets significant by at least one of the P values at the 0.05 level are displayed in a Venn diagram. (B) The −log10 of the SAPSq-value is plotted on the y-axis and the SAPSscore along the x axis for each of the 5320 gene sets in the Molecular Signatures Database for their prognostic significance in the ER+/HER2− low proliferation breast cancer molecular subtype. Each point in the scatterplot represents a gene set, and gene sets that achieved a SAPSq-value≤0.05 and an absolute value (SAPSscore)≥1.3 are colored in red.
Figure 5
Figure 5. HER2+ Venn diagram and scatterplot.
(A) The gene sets significant by at least one of the P values at the 0.05 level are displayed in a Venn diagram. (B) The −log10 of the SAPSq-value is plotted on the y-axis and the SAPSscore along the x axis for each of the 5320 gene sets in the Molecular Signatures Database for their prognostic significance in the HER2+ breast cancer molecular subtype. Each point in the scatterplot represents a gene set, and gene sets that achieved a SAPSq-value≤0.05 and an absolute value (SAPSscore)≥1.3 are colored in red.
Figure 6
Figure 6. ER−/HER2− Venn diagram and scatterplot.
(A) The gene sets significant by at least one of the P values at the 0.05 level are displayed in a Venn diagram. (B) The −log10 of the SAPSq-value is plotted on the y-axis and the SAPSscore along the x axis for each of the 5320 gene sets in the Molecular Signatures Database for their prognostic significance in the ER−/HER2− breast cancer molecular subtype. Each point in the scatterplot represents a gene set, and gene sets that achieved a SAPSq-value≤0.05 and an absolute value (SAPSscore)≥1.3 are colored in red.
Figure 7
Figure 7. Global ovarian cancer Venn diagram and scatterplot.
(A) The gene sets significant by at least one of the P values at the 0.05 level are displayed in a Venn diagram. (B) The −log10 of the SAPSq-value is plotted on the y-axis and the SAPSscore along the x axis for each of the 5320 gene sets in the Molecular Signatures Database for their prognostic significance in ovarian cancer overall. Each point in the scatterplot represents a gene set, and gene sets that achieved a SAPSq-value≤0.05 and an absolute value (SAPSscore)≥1.3 are colored in red.
Figure 8
Figure 8. Angiogenic subtype Venn diagram and scatterplot.
(A) The gene sets significant by at least one of the P values at the 0.05 level are displayed in a Venn diagram. (B) The −log10 of the SAPSq-value is plotted on the y-axis and the SAPSscore along the x axis for each of the 5355 gene sets in the Molecular Signatures Database for their prognostic significance in the Angiogenic ovarian cancer molecular subtype. Each point in the scatterplot represents a gene set, and gene sets that achieved a SAPSq-value≤0.05 and an absolute value (SAPSscore)≥1.3 are colored in red.
Figure 9
Figure 9. Non-angiogenic subtype Venn diagram and scatterplot.
(A) The gene sets significant by at least one of the P values at the 0.05 level are displayed in a Venn diagram. (B) The −log10 of the SAPSq-value is plotted on the y-axis and the SAPSscore along the x axis for each of the 535 gene sets in the Molecular Signatures Database for their prognostic significance in the Non-angiogenic ovarian cancer molecular subtype. Each point in the scatterplot represents a gene set, and gene sets that achieved a SAPSq-value≤0.05 and an absolute value (SAPSscore)≥1.3 are colored in red.
Figure 10
Figure 10. Hierarchical clustering of breast and ovarian cancers and their subtypes based on SAPS scores.
Breast cancer and ovarian cancer molecular subtypes were clustered with the 1300 gene sets with absolute value (SAPSscore)≥1.3 and SAPSq-value≤0.05 in at least one disease subtype. Hierarchical clustering was performed on the SAPSScore. In the heatmap, green indicates the gene set is associated with improved prognosis and red with poorer prognosis.

References

    1. Paik S, Tang G, Shak S, Kim C, Baker J, et al. (2006) Gene expression and benefit of chemotherapy in women with node-negative, estrogen receptor–positive breast cancer. Journal of Clinical Oncology 24: 3726. - PubMed
    1. Loi S, Haibe-Kains B, Desmedt C, Lallemand F, Tutt AM, et al. (2007) Definition of clinically distinct molecular subtypes in estrogen receptor–positive breast carcinomas through genomic grade. Journal of Clinical Oncology 25: 1239. - PubMed
    1. Sotiriou C, Wirapati P, Loi SM, Harris A, Fox S, et al. (2006) Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. Journal of the National Cancer Institute 98: 262–272. - PubMed
    1. Ivshina AV, George J, Senko O, Mow B, Putti TC, et al. (2006) Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer. Cancer Res 66: 10292–10301. - PubMed
    1. Staaf J, Ringnér M, Vallon-Christersson J, Jönsson G, Bendahl PO, et al. (2010) Identification of subtypes in human epidermal growth factor receptor 2–positive breast cancer reveals a gene signature prognostic of outcome. Journal of Clinical Oncology 28: 1813. - PubMed

Publication types