Breast cancer prognosis by combinatorial analysis of gene expression data
- PMID: 16859500
- PMCID: PMC1779471
- DOI: 10.1186/bcr1512
Breast cancer prognosis by combinatorial analysis of gene expression data
Abstract
Introduction: The potential of applying data analysis tools to microarray data for diagnosis and prognosis is illustrated on the recent breast cancer dataset of van 't Veer and coworkers. We re-examine that dataset using the novel technique of logical analysis of data (LAD), with the double objective of discovering patterns characteristic for cases with good or poor outcome, using them for accurate and justifiable predictions; and deriving novel information about the role of genes, the existence of special classes of cases, and other factors.
Method: Data were analyzed using the combinatorics and optimization-based method of LAD, recently shown to provide highly accurate diagnostic and prognostic systems in cardiology, cancer proteomics, hematology, pulmonology, and other disciplines.
Results: LAD identified a subset of 17 of the 25,000 genes, capable of fully distinguishing between patients with poor, respectively good prognoses. An extensive list of 'patterns' or 'combinatorial biomarkers' (that is, combinations of genes and limitations on their expression levels) was generated, and 40 patterns were used to create a prognostic system, shown to have 100% and 92.9% weighted accuracy on the training and test sets, respectively. The prognostic system uses fewer genes than other methods, and has similar or better accuracy than those reported in other studies. Out of the 17 genes identified by LAD, three (respectively, five) were shown to play a significant role in determining poor (respectively, good) prognosis. Two new classes of patients (described by similar sets of covering patterns, gene expression ranges, and clinical features) were discovered. As a by-product of the study, it is shown that the training and the test sets of van 't Veer have differing characteristics.
Conclusion: The study shows that LAD provides an accurate and fully explanatory prognostic system for breast cancer using genomic data (that is, a system that, in addition to predicting good or poor prognosis, provides an individualized explanation of the reasons for that prognosis for each patient). Moreover, the LAD model provides valuable insights into the roles of individual and combinatorial biomarkers, allows the discovery of new classes of patients, and generates a vast library of biomedical research hypotheses.
Similar articles
-
Logical analysis of diffuse large B-cell lymphomas.Artif Intell Med. 2005 Jul;34(3):235-67. doi: 10.1016/j.artmed.2004.11.004. Artif Intell Med. 2005. PMID: 16023562
-
Mixture classification model based on clinical markers for breast cancer prognosis.Artif Intell Med. 2010 Feb-Mar;48(2-3):129-37. doi: 10.1016/j.artmed.2009.07.008. Epub 2009 Dec 14. Artif Intell Med. 2010. PMID: 20005686
-
Pseudogene-gene functional networks are prognostic of patient survival in breast cancer.BMC Med Genomics. 2020 Apr 3;13(Suppl 5):51. doi: 10.1186/s12920-020-0687-0. BMC Med Genomics. 2020. PMID: 32241256 Free PMC article.
-
Gene expression profiles of breast cancer obtained from core cut biopsies before neoadjuvant docetaxel, adriamycin, and cyclophoshamide chemotherapy correlate with routine prognostic markers and could be used to identify predictive signatures.Zentralbl Gynakol. 2006 Apr;128(2):76-81. doi: 10.1055/s-2006-921508. Zentralbl Gynakol. 2006. PMID: 16673249 Clinical Trial.
-
From description to causality: mechanisms of gene expression signatures in cancer.Cell Cycle. 2006 Jun;5(11):1148-51. doi: 10.4161/cc.5.11.2798. Epub 2006 Jun 1. Cell Cycle. 2006. PMID: 16721055 Review.
Cited by
-
A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data.Diagnostics (Basel). 2023 Feb 13;13(4):708. doi: 10.3390/diagnostics13040708. Diagnostics (Basel). 2023. PMID: 36832196 Free PMC article.
-
Network-based inference framework for identifying cancer genes from gene expression data.Biomed Res Int. 2013;2013:401649. doi: 10.1155/2013/401649. Epub 2013 Sep 1. Biomed Res Int. 2013. PMID: 24073403 Free PMC article.
-
Data perturbation independent diagnosis and validation of breast cancer subtypes using clustering and patterns.Cancer Inform. 2007 Feb 19;2:243-74. Cancer Inform. 2007. PMID: 19458770 Free PMC article.
-
Comparative survival analysis of breast cancer microarray studies identifies important prognostic genetic pathways.BMC Cancer. 2010 Oct 21;10:573. doi: 10.1186/1471-2407-10-573. BMC Cancer. 2010. PMID: 20964848 Free PMC article.
-
Logical Analysis of Data in Structure-Activity Investigation of Polymeric Gene Delivery.Macromol Theory Simul. 2011 May 23;20(4):275-285. doi: 10.1002/mats.201000087. Macromol Theory Simul. 2011. PMID: 25663794 Free PMC article.
References
-
- Crama Y, Hammer PL, Ibaraki T. Cause-effect relationships and partially defined boolean functions. Ann Oper Res. 1988;16:299–326. doi: 10.1007/BF02283750. - DOI
-
- Boros E, Hammer PL, Ibaraki T, Kogan A, Mayoraz E, Muchnik I. An implementation of logical analysis of data. IEEE Trans Knowledge and Data Eng. 2000;12:292–306. doi: 10.1109/69.842268. - DOI
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
Research Materials