. 2008 Feb 27:9:125.

doi: 10.1186/1471-2105-9-125.

Merging microarray data from separate breast cancer studies provides a robust prognostic test

Lei Xu¹, Aik Choon Tan, Raimond L Winslow, Donald Geman

Affiliations

Affiliation

¹ The Institute for Computational Medicine and Center for Cardiovascular Bioinformatics and Modeling, Johns Hopkins University, Baltimore, MD 21218, USA. leixu@jhu.edu

PMID: 18304324
PMCID: PMC2409450
DOI: 10.1186/1471-2105-9-125

Merging microarray data from separate breast cancer studies provides a robust prognostic test

Lei Xu et al. BMC Bioinformatics. 2008.

. 2008 Feb 27:9:125.

doi: 10.1186/1471-2105-9-125.

Authors

Lei Xu¹, Aik Choon Tan, Raimond L Winslow, Donald Geman

Affiliation

¹ The Institute for Computational Medicine and Center for Cardiovascular Bioinformatics and Modeling, Johns Hopkins University, Baltimore, MD 21218, USA. leixu@jhu.edu

PMID: 18304324
PMCID: PMC2409450
DOI: 10.1186/1471-2105-9-125

Abstract

Background: There is an urgent need for new prognostic markers of breast cancer metastases to ensure that newly diagnosed patients receive appropriate therapy. Recent studies have demonstrated the potential value of gene expression signatures in assessing the risk of developing distant metastases. However, due to the small sample sizes of individual studies, the overlap among signatures is almost zero and their predictive power is often limited. Integrating microarray data from multiple studies in order to increase sample size is therefore a promising approach to the development of more robust prognostic tests.

Results: In this study, by using a highly stable data aggregation procedure based on expression comparisons, we have integrated three independent microarray gene expression data sets for breast cancer and identified a structured prognostic signature consisting of 112 genes organized into 80 pair-wise expression comparisons. A classical likelihood ratio test based on these comparisons, essentially weighted voting, achieves 88.6% sensitivity and 54.6% specificity in an independent external test set of 154 samples. The test is highly informative in assessing the risk of developing distant metastases within five years (hazard ratio 9.3 with 95% CI 2.9-29.9).

Conclusion: Rank-based features provide a stable way to integrate patient data from separate microarray studies due to invariance to data normalization, and such features can be combined into a useful predictor of distant metastases in breast cancer within a statistical modeling framework which begins to capture gene-gene interactions. Upon further confirmation on large-scale independent data, such prognostic signatures and tests could provide a powerful tool to guide adjuvant systemic treatment that could greatly reduce the cost of breast cancer treatment, both in terms of toxic side effects and health care expenditures.

PubMed Disclaimer

Figures

**Figure 1**
**Choosing size of the signature**. The relationship between the number of features in a prognostic signature and the specificity at 90% sensitivity of the corresponding prognostic test, evaluated by 40-fold cross-validation. We select m_opt= 80, the smallest value that achieves roughly maximum specificity at the 90% sensitivity level. The specificity observed on the validation set is in fact higher.

**Figure 2**
**The heat map of the 80 signature gene pairs**. The Wang data set is used to illustrate the gene expression values of the signature genes. A heat map is generated using the matrix2png software [34]. There are 80 rows corresponding to the 80 gene pairs; the displayed intensities are the differences between the expression values of the two genes in each pair. The expression value for each difference is normalized across the samples to zero mean and one standard deviation (SD) for visualization purposes. Differences with expression levels greater than the mean are colored in red and those below the mean are colored in green. The scale indicates the number of SDs above or below the mean.

**Figure 3**
**The Kaplan-Meier analysis**. Kaplan-Meier analysis of the probability of remaining free of distant metastases among 159 Pawitan patients between the good-outcome group and the poor-outcome group. The LRT is based on the integrated data in (A) and the single, Wang data set in (B). CI denotes confidence interval and the p-value is calculated by the log-rank test.

See this image and copyright information in PMC

Cited by

Logic Learning Machine creates explicit and stable rules stratifying neuroblastoma patients.
Cangelosi D, Blengio F, Versteeg R, Eggert A, Garaventa A, Gambini C, Conte M, Eva A, Muselli M, Varesio L. Cangelosi D, et al. BMC Bioinformatics. 2013;14 Suppl 7(Suppl 7):S12. doi: 10.1186/1471-2105-14-S7-S12. Epub 2013 Apr 22. BMC Bioinformatics. 2013. PMID: 23815266 Free PMC article.
Identification of novel epithelial ovarian cancer biomarkers by cross-laboratory microarray analysis.
Jiang X, Zhu T, Yang J, Li S, Ye S, Liao S, Meng L, Lu Y, Ma D. Jiang X, et al. J Huazhong Univ Sci Technolog Med Sci. 2010 Jun;30(3):354-9. doi: 10.1007/s11596-010-0356-1. Epub 2010 Jun 17. J Huazhong Univ Sci Technolog Med Sci. 2010. PMID: 20556581
Relative expression analysis for molecular cancer diagnosis and prognosis.
Eddy JA, Sung J, Geman D, Price ND. Eddy JA, et al. Technol Cancer Res Treat. 2010 Apr;9(2):149-59. doi: 10.1177/153303461000900204. Technol Cancer Res Treat. 2010. PMID: 20218737 Free PMC article. Review.
Effect of data combination on predictive modeling: a study using gene expression data.
Osl M, Dreiseitl S, Kim J, Patel K, Baumgartner C, Ohno-Machado L. Osl M, et al. AMIA Annu Symp Proc. 2010 Nov 13;2010:567-71. AMIA Annu Symp Proc. 2010. PMID: 21347042 Free PMC article.
Gene expression profiling of breast cancer survivability by pooled cDNA microarray analysis using logistic regression, artificial neural networks and decision trees.
Chou HL, Yao CT, Su SL, Lee CY, Hu KY, Terng HJ, Shih YW, Chang YT, Lu YF, Chang CW, Wahlqvist ML, Wetter T, Chu CM. Chou HL, et al. BMC Bioinformatics. 2013 Mar 19;14:100. doi: 10.1186/1471-2105-14-100. BMC Bioinformatics. 2013. PMID: 23506640 Free PMC article.

See all "Cited by" articles

References

1. Jemal A, Siegel R, Ward E, Murray T, Xu J, Smigal C, Thun MJ. Cancer Statistics, 2006. CA Cancer J Clin. 2006;56:106–130. - PubMed
1. Eifel P, Axelson JA, Crowley J, Curran WJ, Deshler A, Fulton S, Hendricks CB, Kemeny M. National Institutes of Health Consensus Development Conference Statement: Adjuvant Therapy for Breast Cancer, November 1-3, 2000. J Natl Cancer Inst. 2001;93:979–989. doi: 10.1093/jnci/93.13.979. - DOI - PubMed
1. Goldhirsch A, Glick JH, Gelber RD, Coates AS, Thurlimann B, Senn HJ, and Panel M. Meeting Highlights: International Expert Consensus on the Primary Therapy of Early Breast Cancer 2005. Ann Oncol. 2005;16:1569–1583. doi: 10.1093/annonc/mdi326. - DOI - PubMed
1. Early Breast Cancer Trialists' Collaborative G. Polychemotherapy for early breast cancer: an overview of the randomised trials. The Lancet. 1998;352:930. doi: 10.1016/S0140-6736(98)03301-7. - DOI - PubMed
1. van de Vijver MJ, He YD, van 't Veer LJ, Dai H, Hart AAM, Voskuil DW, Schreiber GJ, Peterse JL, Roberts C, Marton MJ, Parrish M, Atsma D, Witteveen A, Glas A, Delahaye L, van der Velde T, Bartelink H, Rodenhuis S, Rutgers ET, Friend SH, Bernards R. A Gene-Expression Signature as a Predictor of Survival in Breast Cancer. N Engl J Med. 2002;347:1999–2009. doi: 10.1056/NEJMoa021967. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Merging microarray data from separate breast cancer studies provides a robust prognostic test

Affiliation

Merging microarray data from separate breast cancer studies provides a robust prognostic test

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical