Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jan 15;26(2):250-8.
doi: 10.1093/bioinformatics/btp640. Epub 2009 Nov 18.

Pathway analysis using random forests with bivariate node-split for survival outcomes

Affiliations

Pathway analysis using random forests with bivariate node-split for survival outcomes

Herbert Pang et al. Bioinformatics. .

Abstract

Motivation: There is great interest in pathway-based methods for genomics data analysis in the research community. Although machine learning methods, such as random forests, have been developed to correlate survival outcomes with a set of genes, no study has assessed the abilities of these methods in incorporating pathway information for analyzing microarray data. In general, genes that are identified without incorporating biological knowledge are more difficult to interpret. Correlating pathway-based gene expression with survival outcomes may lead to biologically more meaningful prognosis biomarkers. Thus, a comprehensive study on how these methods perform in a pathway-based setting is warranted.

Results: In this article, we describe a pathway-based method using random forests to correlate gene expression data with survival outcomes and introduce a novel bivariate node-splitting random survival forests. The proposed method allows researchers to identify important pathways for predicting patient prognosis and time to disease progression, and discover important genes within those pathways. We compared different implementations of random forests with different split criteria and found that bivariate node-splitting random survival forests with log-rank test is among the best. We also performed simulation studies that showed random forests outperforms several other machine learning algorithms and has comparable results with a newly developed component-wise Cox boosting model. Thus, pathway-based survival analysis using machine learning tools represents a promising approach in dissecting pathways and for generating new biological hypothesis from microarray studies.

Availability: R package Pwayrfsurvival is available from URL: http://www.duke.edu/~hp44/pwayrfsurvival.htm.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
A schematic diagram of pathway analysis for survival outcomes using random forests.
Fig. 2.
Fig. 2.
A multidimensional scaling plot for TGF beta signaling pathway.
Fig. 3.
Fig. 3.
An outliers plot of 30 patients for TGF beta signaling pathway.

Similar articles

Cited by

References

    1. Altucci L, et al. RAR and RXR modulation in cancer and metabolic disease. Nat. Rev. Drug Discov. 2007;6:793–810. - PubMed
    1. Ardini E, et al. Expression of protein tyrosine phosphatase alpha (RPTPalpha) in human breast cancer correlates with low tumor grade, and inhibits tumor cell growth in vitro and in vivo. Oncogene. 2000;19:4979–4987. - PubMed
    1. Baldini E, et al. Cyclin A and E2F1 overexpression correlate with reduced disease-free survival in node-negative breast cancer patients. Anticancer Res. 2006;26:4415–4421. - PubMed
    1. Barlow J, et al. Higher stromal expression of transforming growth factor-beta type II receptors is associated with poorer prognosis breast tumors. Breast Cancer Res. Treat. 2003;79:149–159. - PubMed
    1. Bonneterre J, et al. Prognostic significance of insulin-like growth factor 1 receptors in human breast cancer. Cancer Res. 1990;50:6931–6935. - PubMed

Publication types

MeSH terms