Extreme value distribution based gene selection criteria for discriminant microarray data analysis using logistic regression
- PMID: 15285889
- DOI: 10.1089/1066527041410445
Extreme value distribution based gene selection criteria for discriminant microarray data analysis using logistic regression
Abstract
One important issue commonly encountered in the analysis of microarray data is to decide which and how many genes should be selected for further studies. For discriminant microarray data analyses based on statistical models, such as the logistic regression models, gene selection can be accomplished by a comparison of the maximum likelihood of the model given the real data, L(D|M), and the expected maximum likelihood of the model given an ensemble of surrogate data with randomly permuted label, L(D(0)|M). Typically, the computational burden for obtaining L(D(0)M) is immense, often exceeding the limits of available computing resources by orders of magnitude. Here, we propose an approach that circumvents such heavy computations by mapping the simulation problem to an extreme-value problem. We present the derivation of an asymptotic distribution of the extreme-value as well as its mean, median, and variance. Using this distribution, we propose two gene selection criteria, and we apply them to two microarray datasets and three classification tasks for illustration.
Similar articles
-
Selecting a minimal number of relevant genes from microarray data to design accurate tissue classifiers.Biosystems. 2007 Jul-Aug;90(1):78-86. doi: 10.1016/j.biosystems.2006.07.002. Epub 2006 Jul 10. Biosystems. 2007. PMID: 17291683
-
Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data.Bioinformatics. 2007 Dec 1;23(23):3170-7. doi: 10.1093/bioinformatics/btm488. Epub 2007 Oct 12. Bioinformatics. 2007. PMID: 17933851
-
Efficient quadratic regularization for expression arrays.Biostatistics. 2004 Jul;5(3):329-40. doi: 10.1093/biostatistics/5.3.329. Biostatistics. 2004. PMID: 15208198
-
Cancer classification and prediction using logistic regression with Bayesian gene selection.J Biomed Inform. 2004 Aug;37(4):249-59. doi: 10.1016/j.jbi.2004.07.009. J Biomed Inform. 2004. PMID: 15465478
-
Filter versus wrapper gene selection approaches in DNA microarray domains.Artif Intell Med. 2004 Jun;31(2):91-103. doi: 10.1016/j.artmed.2004.01.007. Artif Intell Med. 2004. PMID: 15219288 Review.
Cited by
-
Network Medicine: New Paradigm in the -Omics Era.Anat Physiol. 2011 Dec 13;1(1):1000e106. doi: 10.4172/2161-0940.1000e106. Anat Physiol. 2011. PMID: 24634802 Free PMC article. No abstract available.
-
Optimized between-group classification: a new jackknife-based gene selection procedure for genome-wide expression data.BMC Bioinformatics. 2005 Sep 28;6:239. doi: 10.1186/1471-2105-6-239. BMC Bioinformatics. 2005. PMID: 16191195 Free PMC article.
-
Network-based identification of biomarkers coexpressed with multiple pathways.Cancer Inform. 2014 Oct 16;13(Suppl 5):37-47. doi: 10.4137/CIN.S14054. eCollection 2014. Cancer Inform. 2014. PMID: 25392692 Free PMC article. Review.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials