Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Feb 27:9:125.
doi: 10.1186/1471-2105-9-125.

Merging microarray data from separate breast cancer studies provides a robust prognostic test

Affiliations

Merging microarray data from separate breast cancer studies provides a robust prognostic test

Lei Xu et al. BMC Bioinformatics. .

Abstract

Background: There is an urgent need for new prognostic markers of breast cancer metastases to ensure that newly diagnosed patients receive appropriate therapy. Recent studies have demonstrated the potential value of gene expression signatures in assessing the risk of developing distant metastases. However, due to the small sample sizes of individual studies, the overlap among signatures is almost zero and their predictive power is often limited. Integrating microarray data from multiple studies in order to increase sample size is therefore a promising approach to the development of more robust prognostic tests.

Results: In this study, by using a highly stable data aggregation procedure based on expression comparisons, we have integrated three independent microarray gene expression data sets for breast cancer and identified a structured prognostic signature consisting of 112 genes organized into 80 pair-wise expression comparisons. A classical likelihood ratio test based on these comparisons, essentially weighted voting, achieves 88.6% sensitivity and 54.6% specificity in an independent external test set of 154 samples. The test is highly informative in assessing the risk of developing distant metastases within five years (hazard ratio 9.3 with 95% CI 2.9-29.9).

Conclusion: Rank-based features provide a stable way to integrate patient data from separate microarray studies due to invariance to data normalization, and such features can be combined into a useful predictor of distant metastases in breast cancer within a statistical modeling framework which begins to capture gene-gene interactions. Upon further confirmation on large-scale independent data, such prognostic signatures and tests could provide a powerful tool to guide adjuvant systemic treatment that could greatly reduce the cost of breast cancer treatment, both in terms of toxic side effects and health care expenditures.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Choosing size of the signature. The relationship between the number of features in a prognostic signature and the specificity at 90% sensitivity of the corresponding prognostic test, evaluated by 40-fold cross-validation. We select mopt = 80, the smallest value that achieves roughly maximum specificity at the 90% sensitivity level. The specificity observed on the validation set is in fact higher.
Figure 2
Figure 2
The heat map of the 80 signature gene pairs. The Wang data set is used to illustrate the gene expression values of the signature genes. A heat map is generated using the matrix2png software [34]. There are 80 rows corresponding to the 80 gene pairs; the displayed intensities are the differences between the expression values of the two genes in each pair. The expression value for each difference is normalized across the samples to zero mean and one standard deviation (SD) for visualization purposes. Differences with expression levels greater than the mean are colored in red and those below the mean are colored in green. The scale indicates the number of SDs above or below the mean.
Figure 3
Figure 3
The Kaplan-Meier analysis. Kaplan-Meier analysis of the probability of remaining free of distant metastases among 159 Pawitan patients between the good-outcome group and the poor-outcome group. The LRT is based on the integrated data in (A) and the single, Wang data set in (B). CI denotes confidence interval and the p-value is calculated by the log-rank test.

Similar articles

Cited by

References

    1. Jemal A, Siegel R, Ward E, Murray T, Xu J, Smigal C, Thun MJ. Cancer Statistics, 2006. CA Cancer J Clin. 2006;56:106–130. - PubMed
    1. Eifel P, Axelson JA, Crowley J, Curran WJ, Deshler A, Fulton S, Hendricks CB, Kemeny M. National Institutes of Health Consensus Development Conference Statement: Adjuvant Therapy for Breast Cancer, November 1-3, 2000. J Natl Cancer Inst. 2001;93:979–989. doi: 10.1093/jnci/93.13.979. - DOI - PubMed
    1. Goldhirsch A, Glick JH, Gelber RD, Coates AS, Thurlimann B, Senn HJ, and Panel M. Meeting Highlights: International Expert Consensus on the Primary Therapy of Early Breast Cancer 2005. Ann Oncol. 2005;16:1569–1583. doi: 10.1093/annonc/mdi326. - DOI - PubMed
    1. Early Breast Cancer Trialists' Collaborative G. Polychemotherapy for early breast cancer: an overview of the randomised trials. The Lancet. 1998;352:930. doi: 10.1016/S0140-6736(98)03301-7. - DOI - PubMed
    1. van de Vijver MJ, He YD, van 't Veer LJ, Dai H, Hart AAM, Voskuil DW, Schreiber GJ, Peterse JL, Roberts C, Marton MJ, Parrish M, Atsma D, Witteveen A, Glas A, Delahaye L, van der Velde T, Bartelink H, Rodenhuis S, Rutgers ET, Friend SH, Bernards R. A Gene-Expression Signature as a Predictor of Survival in Breast Cancer. N Engl J Med. 2002;347:1999–2009. doi: 10.1056/NEJMoa021967. - DOI - PubMed

Publication types

MeSH terms