Pathway analysis using random forests with bivariate node-split for survival outcomes
- PMID: 19933158
- PMCID: PMC2804301
- DOI: 10.1093/bioinformatics/btp640
Pathway analysis using random forests with bivariate node-split for survival outcomes
Abstract
Motivation: There is great interest in pathway-based methods for genomics data analysis in the research community. Although machine learning methods, such as random forests, have been developed to correlate survival outcomes with a set of genes, no study has assessed the abilities of these methods in incorporating pathway information for analyzing microarray data. In general, genes that are identified without incorporating biological knowledge are more difficult to interpret. Correlating pathway-based gene expression with survival outcomes may lead to biologically more meaningful prognosis biomarkers. Thus, a comprehensive study on how these methods perform in a pathway-based setting is warranted.
Results: In this article, we describe a pathway-based method using random forests to correlate gene expression data with survival outcomes and introduce a novel bivariate node-splitting random survival forests. The proposed method allows researchers to identify important pathways for predicting patient prognosis and time to disease progression, and discover important genes within those pathways. We compared different implementations of random forests with different split criteria and found that bivariate node-splitting random survival forests with log-rank test is among the best. We also performed simulation studies that showed random forests outperforms several other machine learning algorithms and has comparable results with a newly developed component-wise Cox boosting model. Thus, pathway-based survival analysis using machine learning tools represents a promising approach in dissecting pathways and for generating new biological hypothesis from microarray studies.
Availability: R package Pwayrfsurvival is available from URL: http://www.duke.edu/~hp44/pwayrfsurvival.htm.
Supplementary information: Supplementary data are available at Bioinformatics online.
Figures
Similar articles
-
Pathway analysis using random forests classification and regression.Bioinformatics. 2006 Aug 15;22(16):2028-36. doi: 10.1093/bioinformatics/btl344. Epub 2006 Jun 29. Bioinformatics. 2006. PMID: 16809386
-
Gene selection using iterative feature elimination random forests for survival outcomes.IEEE/ACM Trans Comput Biol Bioinform. 2012 Sep-Oct;9(5):1422-31. doi: 10.1109/TCBB.2012.63. IEEE/ACM Trans Comput Biol Bioinform. 2012. PMID: 22547432 Free PMC article.
-
A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification.BMC Bioinformatics. 2008 Jul 22;9:319. doi: 10.1186/1471-2105-9-319. BMC Bioinformatics. 2008. PMID: 18647401 Free PMC article.
-
A primer on gene expression and microarrays for machine learning researchers.J Biomed Inform. 2004 Aug;37(4):293-303. doi: 10.1016/j.jbi.2004.07.002. J Biomed Inform. 2004. PMID: 15465482 Review.
-
A review of feature selection techniques in bioinformatics.Bioinformatics. 2007 Oct 1;23(19):2507-17. doi: 10.1093/bioinformatics/btm344. Epub 2007 Aug 24. Bioinformatics. 2007. PMID: 17720704 Review.
Cited by
-
Comparison of Random Forest Model and Frequency Ratio Model for Landslide Susceptibility Mapping (LSM) in Yunyang County (Chongqing, China).Int J Environ Res Public Health. 2020 Jun 12;17(12):4206. doi: 10.3390/ijerph17124206. Int J Environ Res Public Health. 2020. PMID: 32545618 Free PMC article.
-
Sample size considerations of prediction-validation methods in high-dimensional data for survival outcomes.Genet Epidemiol. 2013 Apr;37(3):276-82. doi: 10.1002/gepi.21721. Epub 2013 Mar 7. Genet Epidemiol. 2013. PMID: 23471879 Free PMC article.
-
Pathway hunting by random survival forests.Bioinformatics. 2013 Jan 1;29(1):99-105. doi: 10.1093/bioinformatics/bts643. Epub 2012 Nov 4. Bioinformatics. 2013. PMID: 23129299 Free PMC article.
-
Analysis of a large data set to identify predictors of blood transfusion in primary total hip and knee arthroplasty.Transfusion. 2018 Aug;58(8):1855-1862. doi: 10.1111/trf.14783. Epub 2018 Aug 25. Transfusion. 2018. PMID: 30145838 Free PMC article.
-
Integrative Pathway Analysis Using Graph-Based Learning with Applications to TCGA Colon and Ovarian Data.Cancer Inform. 2014 Jul 28;13(Suppl 4):1-9. doi: 10.4137/CIN.S13634. eCollection 2014. Cancer Inform. 2014. PMID: 25125969 Free PMC article.
References
-
- Altucci L, et al. RAR and RXR modulation in cancer and metabolic disease. Nat. Rev. Drug Discov. 2007;6:793–810. - PubMed
-
- Ardini E, et al. Expression of protein tyrosine phosphatase alpha (RPTPalpha) in human breast cancer correlates with low tumor grade, and inhibits tumor cell growth in vitro and in vivo. Oncogene. 2000;19:4979–4987. - PubMed
-
- Baldini E, et al. Cyclin A and E2F1 overexpression correlate with reduced disease-free survival in node-negative breast cancer patients. Anticancer Res. 2006;26:4415–4421. - PubMed
-
- Barlow J, et al. Higher stromal expression of transforming growth factor-beta type II receptors is associated with poorer prognosis breast tumors. Breast Cancer Res. Treat. 2003;79:149–159. - PubMed
-
- Bonneterre J, et al. Prognostic significance of insulin-like growth factor 1 receptors in human breast cancer. Cancer Res. 1990;50:6931–6935. - PubMed