Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2014 Dec 31:7:513.
doi: 10.1186/s12920-014-0074-9.

Meta-analysis of prostate cancer gene expression data identifies a novel discriminatory signature enriched for glycosylating enzymes

Affiliations
Review

Meta-analysis of prostate cancer gene expression data identifies a novel discriminatory signature enriched for glycosylating enzymes

Stefan J Barfeld et al. BMC Med Genomics. .

Abstract

Background: Tumorigenesis is characterised by changes in transcriptional control. Extensive transcript expression data have been acquired over the last decade and used to classify prostate cancers. Prostate cancer is, however, a heterogeneous multifocal cancer and this poses challenges in identifying robust transcript biomarkers.

Methods: In this study, we have undertaken a meta-analysis of publicly available transcriptomic data spanning datasets and technologies from the last decade and encompassing laser capture microdissected and macrodissected sample sets.

Results: We identified a 33 gene signature that can discriminate between benign tissue controls and localised prostate cancers irrespective of detection platform or dissection status. These genes were significantly overexpressed in localised prostate cancer versus benign tissue in at least three datasets within the Oncomine Compendium of Expression Array Data. In addition, they were also overexpressed in a recent exon-array dataset as well a prostate cancer RNA-seq dataset generated as part of the The Cancer Genomics Atlas (TCGA) initiative. Biologically, glycosylation was the single enriched process associated with this 33 gene signature, encompassing four glycosylating enzymes. We went on to evaluate the performance of this signature against three individual markers of prostate cancer, v-ets avian erythroblastosis virus E26 oncogene homolog (ERG) expression, prostate specific antigen (PSA) expression and androgen receptor (AR) expression in an additional independent dataset. Our signature had greater discriminatory power than these markers both for localised cancer and metastatic disease relative to benign tissue, or in the case of metastasis, also localised prostate cancer.

Conclusion: In conclusion, robust transcript biomarkers are present within datasets assembled over many years and cohorts and our study provides both examples and a strategy for refining and comparing datasets to obtain additional markers as more data are generated.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Gene signatures capable of discriminating between prostate cancer subgroups and classify metastatic disease. Gene signatures generated using the Varambally dataset and found to be significant discriminators of metastatic disease and primary/localised cancers (Additional file 10: Table S8) when applied to the Tomlins and Rawaswamy datasets were used to cluster samples in these datasets in a heatmap. The gene signatures represented are those capable of characterising samples from at least one progression stage (Fischer’s exact < = 0.05). Gene signatures are rows and samples are columns. The colour coded bar at the base of the heatmap indicates the clinical grouping for each sample as also defined in the key. Metastatic hormone refractory, metastatic hormone naïve and hormone refractory vs. naïve represent prostate cancer cases from the Tomlins dataset, as do PIN (prostatic intraepithelial neoplasia) and primary carcinoma. The other categories (metastatic and primary) are samples from the Rawaswamy dataset and are metastatic and primary cancers from multiple organ sites, not simply the prostate gland. The blue bar graph on the right-hand side of the heatmap depicts the number of genes in each signature which are differentially expressed and contribute to the sample clustering in this analysis. For signature 1 (dist 101.6.1 and Additional file 5: Table S3) this is 1748 genes in total as highlighted and other bars are numbers of genes relative to this. The colour scale represents the mean log2 fold change for differential gene signatures (> = abs log2(2)). Red indicates module induction, green repression. Gene signatures significant in both directions are indicated in yellow. Using the mean module log2 fold change we clustered the samples and modules using hierarchical clustering with euclidean distance as a measure of dissimilarity. Data points that contained both induced and repressed values have been excluded from the clustering.
Figure 2
Figure 2
Differential expression of a 71-gene signature classifier in a prostate cancer exon-array dataset (Taylor et al. ) and the TCGA RNA-seq dataset for prostate cancer (TCGA-PRAD). The expression values of the 71-gene signature (dist.0.6.34) capable of subclustering localised prostate cancer from other samples in all three interrogated datasets are shown in two independent datasets, A. a prostate cancer exon-array dataset (Taylor et al.) and B. TCGA RNA-seq dataset for prostate cancer (TCGA-PRAD) were used. Values were log2 normalized and the mean of the sample groups (PRIMARY TUMOUR/SOLID TISSUE NORMAL) is shown.
Figure 3
Figure 3
Heatmaps confirming the clustering ability of the 33-gene signature in a prostate cancer exon-array dataset (Taylor et al. ) and the TCGA RNA-seq dataset for prostate cancer (TCGA-PRAD). The 33-gene signature was applied to two independent datasets, A. a prostate cancer exon-array dataset (Taylor et al.), and B. TCGA RNA-seq dataset for prostate cancer (TCGA-PRAD). Expression values were log2 transformed, normalized for high level mean and variance and hierarchically clustered using Euclidian distance. Genes are rows and samples are columns. The colour coded bars indicate expression values and the clinical grouping for each sample as defined in the keys.
Figure 4
Figure 4
Receiver operating characteristic (ROC) curves for discrimination between localised prostate cancer and benign cases, metastatic and benign cases and metastatic and prostate cancers using a 31-gene signature (row 1), AR (row 2), ERG (row 3) and KLK3 (row 4).
Figure 5
Figure 5
Workflow for the identification of robust gene signatures and gene sets for clustering prostate cancer cases. In step 1, we identified all statistically significant differentially expressed Affymetrix array probes in a small dataset consisting of 13 macrodissected clinical samples encompassing localised benign prostatic hyperplasia, localised prostate cancer and metastatic disease (GSE3325). We then generated gene signatures from these based on gene coexpression at varying stringency thresholds. These gene signatures were then applied to two additional datasets, a microdissected dataset (Tomlins et al.) and a multi-tissue site cancer and metastatic dataset (Ramaswamy et al.). A large number of the coexpression gene signatures clustered localised prostate cancers from metastatic disease and prostate metastases from other sample sets. The most compact gene signature able to do so consisted of 71 genes (A) and we assessed its expression pattern in two additional datasets, an exon-array dataset (Taylor et al.) and in a RNA-sequenced dataset (TCGA-PRAD). Few of the genes in the significant coexpression gene signatures were overexpressed genes in localised prostate cancers. In the second phase of the study, we abstracted all of the overexpressed genes and refined this down to a set of 33 genes based on significant overexpression in additional publicly available prostate cancer microarray datasets housed within the Oncomine database (B). These genes also effectively clustered benign versus cancer cases in an exon-array dataset (Taylor et al.) an expression microarray dataset (Grasso et al.) and a RNA-sequenced dataset (TCGA-PRAD) (C and D). In conclusion, it is possible to generate gene classifiers of clinical prostate cancer from a small dataset of macrodissected samples with the capacity to classify larger sequenced and microdissected datasets based on clinical characteristics.

Similar articles

Cited by

References

    1. Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, Barrette T, Pandey A, Chinnaiyan AM. Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc Natl Acad Sci U S A. 2004;101:9309–9314. doi: 10.1073/pnas.0401994101. - DOI - PMC - PubMed
    1. Segal E, Friedman N, Koller D, Regev A. A module map showing conditional activity of expression modules in cancer. Nat Genet. 2004;36:1090–1098. doi: 10.1038/ng1434. - DOI - PubMed
    1. Horvath S, Zhang B, Carlson M, Lu KV, Zhu S, Felciano RM, Laurance MF, Zhao W, Qi S, Chen Z, Lee Y, Scheck AC, Liau LM, Wu H, Geschwind DH, Febbo PG, Kornblum HI, Cloughesy TF, Nelson SF, Mischel PS. Analysis of oncogenic signaling networks in glioblastoma identifies ASPM as a molecular target. Proc Natl Acad Sci U S A. 2006;103:17402–17407. doi: 10.1073/pnas.0608396103. - DOI - PMC - PubMed
    1. Stuart RO, Wachsman W, Berry CC, Wang-Rodriguez J, Wasserman L, Klacansky I, Masys D, Arden K, Goodison S, McClelland M, Wang Y, Sawyers A, Kalcheva I, Tarin D, Mercola D. In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. Proc Natl Acad Sci USA. 2004;101:615–620. doi: 10.1073/pnas.2536479100. - DOI - PMC - PubMed
    1. Tomlins SA, Mehra R, Rhodes DR, Cao X, Wang L, Dhanasekaran SM, Kalyana-Sundaram S, Wei JT, Rubin MA, Pienta KJ, Shah RB, Chinnaiyan AM. Integrative molecular concept modeling of prostate cancer progression. Nat Genet. 2007;39:41–51. doi: 10.1038/ng1935. - DOI - PubMed

Substances