Improved centroids estimation for the nearest shrunken centroid classifier
- PMID: 17384429
- DOI: 10.1093/bioinformatics/btm046
Improved centroids estimation for the nearest shrunken centroid classifier
Abstract
Motivation: The nearest shrunken centroid (NSC) method has been successfully applied in many DNA-microarray classification problems. The NSC uses 'shrunken' centroids as prototypes for each class and identifies subsets of genes that best characterize each class. Classification is then made to the nearest (shrunken) centroid. The NSC is very easy to implement and very easy to interpret, however, it has drawbacks.
Results: We show that the NSC method can be interpreted in the framework of LASSO regression. Based on that, we consider two new methods, adaptive L(infinity)-norm penalized NSC (ALP-NSC) and adaptive hierarchically penalized NSC (AHP-NSC), with two different penalty functions for microarray classification, which improve over the NSC. Unlike the L(1)-norm penalty used in LASSO, the penalty terms that we consider make use of the fact that parameters belonging to one gene should be treated as a natural group. Numerical results indicate that the two new methods tend to remove irrelevant genes more effectively and provide better classification results than the L(1)-norm approach.
Availability: R code for the ALP-NSC and the AHP-NSC algorithms are available from authors upon request.
Similar articles
-
Differential gene expression detection and sample classification using penalized linear regression models.Bioinformatics. 2006 Feb 15;22(4):472-6. doi: 10.1093/bioinformatics/bti827. Epub 2005 Dec 13. Bioinformatics. 2006. PMID: 16352654
-
Classification of microarrays to nearest centroids.Bioinformatics. 2005 Nov 15;21(22):4148-54. doi: 10.1093/bioinformatics/bti681. Epub 2005 Sep 20. Bioinformatics. 2005. PMID: 16174683
-
Variable selection for model-based high-dimensional clustering and its application to microarray data.Biometrics. 2008 Jun;64(2):440-8. doi: 10.1111/j.1541-0420.2007.00922.x. Epub 2007 Oct 26. Biometrics. 2008. PMID: 17970821
-
Classification based upon gene expression data: bias and precision of error rates.Bioinformatics. 2007 Jun 1;23(11):1363-70. doi: 10.1093/bioinformatics/btm117. Epub 2007 Mar 28. Bioinformatics. 2007. PMID: 17392326 Review.
-
How does gene expression clustering work?Nat Biotechnol. 2005 Dec;23(12):1499-501. doi: 10.1038/nbt1205-1499. Nat Biotechnol. 2005. PMID: 16333293 Review.
Cited by
-
Nearest shrunken centroids via alternative genewise shrinkages.PLoS One. 2017 Feb 15;12(2):e0171068. doi: 10.1371/journal.pone.0171068. eCollection 2017. PLoS One. 2017. PMID: 28199352 Free PMC article.
-
High-dimensional integrative copula discriminant analysis for multiomics data.Stat Med. 2020 Dec 30;39(30):4869-4884. doi: 10.1002/sim.8758. Epub 2020 Oct 15. Stat Med. 2020. PMID: 33617001 Free PMC article.
-
A ROAD to Classification in High Dimensional Space.J R Stat Soc Series B Stat Methodol. 2012 Sep;74(4):745-771. doi: 10.1111/j.1467-9868.2012.01029.x. Epub 2012 Apr 12. J R Stat Soc Series B Stat Methodol. 2012. PMID: 23074363 Free PMC article.
-
Identification of significant features in DNA microarray data.Wiley Interdiscip Rev Comput Stat. 2013 Jul;5(4):10.1002/wics.1260. doi: 10.1002/wics.1260. Wiley Interdiscip Rev Comput Stat. 2013. PMID: 24244802 Free PMC article.
-
Improved shrunken centroid classifiers for high-dimensional class-imbalanced data.BMC Bioinformatics. 2013 Feb 23;14:64. doi: 10.1186/1471-2105-14-64. BMC Bioinformatics. 2013. PMID: 23433084 Free PMC article.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources