Partial least squares dimension reduction for microarray gene expression data with a censored response
- PMID: 15681279
- DOI: 10.1016/j.mbs.2004.10.007
Partial least squares dimension reduction for microarray gene expression data with a censored response
Abstract
An important application of DNA microarray technologies involves monitoring the global state of transcriptional program in tumor cells. One goal in cancer microarray studies is to compare the clinical outcome, such as relapse-free or overall survival, for subgroups of patients defined by global gene expression patterns. A method of comparing patient survival, as a function of gene expression, was recently proposed in [Bioinformatics 18 (2002) 1625] by Nguyen and Rocke. Due to the (a) high-dimensionality of microarray gene expression data and (b) censored survival times, a two-stage procedure was proposed to relate survival times to gene expression profiles. The first stage involves dimensionality reduction of the gene expression data by partial least squares (PLS) and the second stage involves prediction of survival probability using proportional hazard regression. In this paper, we provide a systematic assessment of the performance of this two-stage procedure. PLS dimension reduction involves complex non-linear functions of both the predictors and the response data, rendering exact analytical study intractable. Thus, we assess the methodology under a simulation model for gene expression data with a censored response variable. In particular, we compare the performance of PLS dimension reduction relative to dimension reduction via principal components analysis (PCA) and to a modified PLS (MPLS) approach. PLS performed substantially better relative to dimension reduction via PCA when the total predictor variance explained is low to moderate (e.g. 40%-60%). It performed similar to MPLS and slightly better in some cases. Additionally, we examine the effect of censoring on dimension reduction stage. The performance of all methods deteriorates for a high censoring rate, although PLS-PH performed relatively best overall.
Similar articles
-
Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data.Bioinformatics. 2005 Jul 1;21(13):3001-8. doi: 10.1093/bioinformatics/bti422. Epub 2005 Apr 6. Bioinformatics. 2005. PMID: 15814556
-
Predicting survival from microarray data--a comparative study.Bioinformatics. 2007 Aug 15;23(16):2080-7. doi: 10.1093/bioinformatics/btm305. Epub 2007 Jun 6. Bioinformatics. 2007. PMID: 17553857
-
Dimension reduction for classification with gene expression microarray data.Stat Appl Genet Mol Biol. 2006;5:Article6. doi: 10.2202/1544-6115.1147. Epub 2006 Feb 24. Stat Appl Genet Mol Biol. 2006. PMID: 16646870
-
Partial least squares: a versatile tool for the analysis of high-dimensional genomic data.Brief Bioinform. 2007 Jan;8(1):32-44. doi: 10.1093/bib/bbl016. Epub 2006 May 26. Brief Bioinform. 2007. PMID: 16772269 Review.
-
Dimension reduction for high-dimensional data.Methods Mol Biol. 2010;620:417-34. doi: 10.1007/978-1-60761-580-4_14. Methods Mol Biol. 2010. PMID: 20652514 Review.
Cited by
-
Rare variants in long non-coding RNAs are associated with blood lipid levels in the TOPMed whole-genome sequencing study.Am J Hum Genet. 2023 Oct 5;110(10):1704-1717. doi: 10.1016/j.ajhg.2023.09.003. Am J Hum Genet. 2023. PMID: 37802043 Free PMC article.
-
Whole Genome DNA and RNA Sequencing of Whole Blood Elucidates the Genetic Architecture of Gene Expression Underlying a Wide Range of Diseases.medRxiv [Preprint]. 2022 May 3:2022.04.13.22273841. doi: 10.1101/2022.04.13.22273841. medRxiv. 2022. Update in: Sci Rep. 2022 Nov 23;12(1):20167. doi: 10.1038/s41598-022-24611-w. PMID: 35547845 Free PMC article. Updated. Preprint.
-
Whole Genome DNA and RNA Sequencing of Whole Blood Elucidates the Genetic Architecture of Gene Expression Underlying a Wide Range of Diseases.Res Sq [Preprint]. 2022 May 31:rs.3.rs-1598646. doi: 10.21203/rs.3.rs-1598646/v1. Res Sq. 2022. Update in: Sci Rep. 2022 Nov 23;12(1):20167. doi: 10.1038/s41598-022-24611-w. PMID: 35664994 Free PMC article. Updated. Preprint.
-
Whole genome DNA and RNA sequencing of whole blood elucidates the genetic architecture of gene expression underlying a wide range of diseases.Sci Rep. 2022 Nov 23;12(1):20167. doi: 10.1038/s41598-022-24611-w. Sci Rep. 2022. PMID: 36424512 Free PMC article.
-
Dimension reduction of microarray gene expression data: the accelerated failure time model.J Bioinform Comput Biol. 2009 Dec;7(6):939-54. doi: 10.1142/s0219720009004412. J Bioinform Comput Biol. 2009. PMID: 20014472 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources