Selection of differentially expressed genes in microarray data analysis
- PMID: 16940966
- DOI: 10.1038/sj.tpj.6500412
Selection of differentially expressed genes in microarray data analysis
Abstract
One common objective in microarray experiments is to identify a subset of genes that express differentially among different experimental conditions, for example, between drug treatment and no drug treatment. Often, the goal is to determine the underlying relationship between poor versus good gene signatures for identifying biological functions or predicting specific therapeutic outcomes. Because of the complexity in studying hundreds or thousands of genes in an experiment, selection of a subset of genes to enhance relationships among the underlying biological structures or to improve prediction accuracy of clinical outcomes has been an important issue in microarray data analysis. Selection of differentially expressed genes is a two-step process. The first step is to select an appropriate test statistic and compute the P-value. The genes are ranked according to their P-values as evidence of differential expression. The second step is to assign a significance level, that is, to determine a cutoff threshold from the P-values in accordance with the study objective. In this paper, we consider four commonly used statistics, t-, S- (SAM), U-(Mann-Whitney) and M-statistics to compute the P-values for gene ranking. We consider the family-wise error and false discovery rate false-positive error-controlled procedures to select a limited number of genes, and a receiver-operating characteristic (ROC) approach to select a larger number of genes for assigning the significance level. The ROC approach is particularly useful in genomic/genetic profiling studies. The well-known colon cancer data containing 22 normal and 40 tumor tissues are used to illustrate different gene ranking and significance level assignment methods for applications to genomic/genetic profiling studies. The P-values computed from the t-, U- and M-statistics are very similar. We discuss the common practice that uses the P-value, false-positive error probability, as the primary criterion, and then uses the fold-change as a surrogate measure of biological significance for gene selection. The P-value and the fold-change can be pictorially shown simultaneously in a volcano plot. We also address several issues on gene selection.
Similar articles
-
Significance analysis of ROC indices for comparing diagnostic markers: applications to gene microarray data.J Biopharm Stat. 2004 Nov;14(4):985-1003. doi: 10.1081/BIP-200035475. J Biopharm Stat. 2004. PMID: 15587976
-
Empirical Bayes screening of many p-values with applications to microarray studies.Bioinformatics. 2005 May 1;21(9):1987-94. doi: 10.1093/bioinformatics/bti301. Epub 2005 Feb 2. Bioinformatics. 2005. PMID: 15691856
-
Estimating the false discovery rate using nonparametric deconvolution.Biometrics. 2007 Sep;63(3):806-15. doi: 10.1111/j.1541-0420.2006.00736.x. Biometrics. 2007. PMID: 17825012
-
Identification of differentially expressed genes and false discovery rate in microarray studies.Curr Opin Lipidol. 2007 Apr;18(2):187-93. doi: 10.1097/MOL.0b013e3280895d6f. Curr Opin Lipidol. 2007. PMID: 17353668 Review.
-
Functional genomics and proteomics in the clinical neurosciences: data mining and bioinformatics.Prog Brain Res. 2006;158:83-108. doi: 10.1016/S0079-6123(06)58004-5. Prog Brain Res. 2006. PMID: 17027692 Review.
Cited by
-
Nuclear Receptors and Stress Response Pathways Associated with the Development of Oral Mucositis Induced by Antineoplastic Agents.Pharmaceuticals (Basel). 2024 Aug 20;17(8):1086. doi: 10.3390/ph17081086. Pharmaceuticals (Basel). 2024. PMID: 39204191 Free PMC article.
-
Exploring the Mechanisms Underlying Drug-Induced Fractures Using the Japanese Adverse Drug Event Reporting Database.Pharmaceuticals (Basel). 2021 Dec 13;14(12):1299. doi: 10.3390/ph14121299. Pharmaceuticals (Basel). 2021. PMID: 34959699 Free PMC article.
-
Transcriptional profiling in response to terminal drought stress reveals differential responses along the wheat genome.BMC Genomics. 2009 Jun 24;10:279. doi: 10.1186/1471-2164-10-279. BMC Genomics. 2009. PMID: 19552804 Free PMC article.
-
CD146 expression is associated with a poor prognosis in human breast tumors and with enhanced motility in breast cancer cell lines.Breast Cancer Res. 2009;11(1):R1. doi: 10.1186/bcr2215. Epub 2009 Jan 5. Breast Cancer Res. 2009. PMID: 19123925 Free PMC article.
-
Identification of autoantibody biomarkers for primary Sjögren's syndrome using protein microarrays.Proteomics. 2011 Apr;11(8):1499-507. doi: 10.1002/pmic.201000206. Epub 2011 Mar 17. Proteomics. 2011. PMID: 21413148 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources