Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Feb 19:2:243-74.

Data perturbation independent diagnosis and validation of breast cancer subtypes using clustering and patterns

Affiliations

Data perturbation independent diagnosis and validation of breast cancer subtypes using clustering and patterns

G Alexe et al. Cancer Inform. .

Abstract

Molecular stratification of disease based on expression levels of sets of genes can help guide therapeutic decisions if such classifications can be shown to be stable against variations in sample source and data perturbation. Classifications inferred from one set of samples in one lab should be able to consistently stratify a different set of samples in another lab. We present a method for assessing such stability and apply it to the breast cancer (BCA) datasets of Sorlie et al. 2003 and Ma et al. 2003. We find that within the now commonly accepted BCA categories identified by Sorlie et al. Luminal A and Basal are robust, but Luminal B and ERBB2+ are not. In particular, 36% of the samples identified as Luminal B and 55% identified as ERBB2+ cannot be assigned an accurate category because the classification is sensitive to data perturbation. We identify a "core cluster" of samples for each category, and from these we determine "patterns" of gene expression that distinguish the core clusters from each other. We find that the best markers for Luminal A and Basal are (ESR1, LIV1, GATA-3) and (CCNE1, LAD1, KRT5), respectively. Pathways enriched in the patterns regulate apoptosis, tissue remodeling and the immune response. We use a different dataset (Ma et al. 2003) to test the accuracy with which samples can be allocated to the four disease subtypes. We find, as expected, that the classification of samples identified as Luminal A and Basal is robust but classification into the other two subtypes is not.

Keywords: Breast cancer; Clusters; Diagnosis; Multi-gene Biomarkers; Patterns.

PubMed Disclaimer

Figures

Figure 1a
Figure 1a
Average agreement scores relative to cluster A.
Figure 1b
Figure 1b
Average cluster agreement scores relative to cluster B.
Figure 1c
Figure 1c
Average cluster agreement scores relative to cluster C.
Figure 1d
Figure 1d
Average cluster agreement scores relative to cluster D.
Figure 1e
Figure 1e
Average cluster agreement scores relative to cluster E.
Figure 1f
Figure 1f
Agreement scores for the unclassifi ed samples in Sorlie et al.
Figure 2
Figure 2
Heatmap of 148 uni-genes for the samples in core categories.
Figure 3
Figure 3
An example of a pattern (pattern PA) characteristic of the Luminal A core cluster (Cluster A) and an example of a pattern (pattern NA ) characteristic of the non-Luminal A cases. Notice that P is satisfi ed by all the samples in the Luminal A group, while N is satisfi ed by 88% of the non-Luminal A cases. Both patterns P and N are expressed as bounding constraints on the expressions of genes Liv-1 and Gata-3.
Figure 4
Figure 4
Heatmap of combined Ma et al. and Sorlie et al. data using the 38 genes identified in the latter data. There are four distinct clusters which are separtaed by vertical lines in the plot. The Normals, Luminal A and Basal core samples from Sorlie et al. cluster well enough with samples in the Ma et al. data to make a phenotype identifi cation possible for the latter data. The B core cluster (Luminal B) looks similar to the Luminal A core cluster with some genes over expressed. Core cluster C (ERBB2+) is most similar to Core D (Basal) presumably because the discriminator gene ERBB2 gene is not on the Ma et al. chip set . The sample labels in the Ma et al. data indicate stages of disease (ADH, DCIS or IDC) and the index number of the patient. Notice that samples from the same patient, even if in different stages of BCA, cluster together.

References

References for Supplementary Information III

    1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B. 1995;57:289–300.
    1. Bonferroni CE. In Studi in Onore del Professore Salvatore Ortu Carboni; Rome: Italy: 1935. Il calcolo delle assicurazioni su gruppi di teste; pp. 13–60.
    1. Dudoit S, Popper Shaffer J, Boldrick JC. Multiple hypothesis testing in microarray experiments. Statistical Science. 2003;18:71–103.
    1. Storey JD, Tibshirani R. Statistical significance for genome-wide studies. Proc. Natl. Acad. Sci. U.S.A. 2003;100:9440–5. - PMC - PubMed

References for Supplementary Information II

    1. Pavilidis P, Qin J, Arango V, et al. Using the gene ontology for microarray data mining: A comparison of methods and application to age effects in human prefrontal cortex. Neurochem. Res. 2004;29:1213–22. - PubMed

References

    1. Abd El-Rehim DM, Ball G, Pinder SE, et al. High-throughput protein expression analysis using tissue microarray technology of a large well-characterised series identifies biologically distinct classes of breast cancer confirming recent cDNA expression analyses. Int. J.Cancer. 2005;116:340–50. - PubMed
    1. Ahnstrom M, Nordenskjold B, Rutqvist LE, et al. Role of cyclin D1 in ErbB2-positive breast cancer and tamoxifen resistance. Breast Cancer Res. Treat. 2005;91:145–51. - PubMed
    1. Alexe G, Alexe S, Crama Y, et al. Consensus algorithms for the generation of all maximal bicliques. Disc. Appl. Math. 2004;145:11–21.
    1. Alexe G, Hammer PL. Spanned patterns in logical analysis of data. Discr. Appl. Math. 2005;154:1039–49.
    1. Alexe G, Bhanot G, Venkataraghavan B, et al. A robust meta-classification strategy for cancer diagnosis from gene expression data. Proc IEEE Comput Syst Bioinform Conf. 2005a:322–5. - PubMed

LinkOut - more resources