Multi-class clustering and prediction in the analysis of microarray data

Chen-An Tsai¹, Te-Chang Lee, I-Ching Ho, Ueng-Cheng Yang, Chun-Houh Chen, James J Chen

Affiliations

PMID: 15681277
DOI: 10.1016/j.mbs.2004.07.002

Multi-class clustering and prediction in the analysis of microarray data

Chen-An Tsai et al. Math Biosci. 2005 Jan.

. 2005 Jan;193(1):79-100.

doi: 10.1016/j.mbs.2004.07.002. Epub 2004 Dec 28.

Authors

Chen-An Tsai¹, Te-Chang Lee, I-Ching Ho, Ueng-Cheng Yang, Chun-Houh Chen, James J Chen

Affiliation

¹ Division of Biometry and Risk Assessment, National Center for Toxicological Research, Food and Drug Administration NCTR/FDA/HFT-20 Jefferson, AR 72079, USA.

PMID: 15681277
DOI: 10.1016/j.mbs.2004.07.002

Abstract

DNA microarray technology provides tools for studying the expression profiles of a large number of distinct genes simultaneously. This technology has been applied to sample clustering and sample prediction. Because of a large number of genes measured, many of the genes in the original data set are irrelevant to the analysis. Selection of discriminatory genes is critical to the accuracy of clustering and prediction. This paper considers statistical significance testing approach to selecting discriminatory gene sets for multi-class clustering and prediction of experimental samples. A toxicogenomic data set with nine treatments (a control and eight metals, As, Cd, Ni, Cr, Sb, Pb, Cu, and AsV with a total of 55 samples) is used to illustrate a general framework of the approach. Among four selected gene sets, a gene set omega(I) formed by the intersection of the F-test and the set of the union of one-versus-all t-tests performs the best in terms of clustering as well as prediction. Hierarchical and two modified partition (k-means) methods all show that the set omega(I) is able to group the 55 samples into seven clusters reasonably well, in which the As and AsV samples are considered as one cluster (the same group) as are the Cd and Cu samples. With respect to prediction, the overall accuracy for the gene set omega(I) using the nearest neighbors algorithm to predict 55 samples into one of the nine treatments is 85%.

PubMed Disclaimer

Cited by

ABC gene-ranking for prediction of drug-induced cholestasis in rats.
Cherkas Y, McMillian MK, Amaratunga D, Raghavan N, Sasaki JC. Cherkas Y, et al. Toxicol Rep. 2016 Jan 18;3:252-261. doi: 10.1016/j.toxrep.2016.01.009. eCollection 2016. Toxicol Rep. 2016. PMID: 28959545 Free PMC article.
Nonlinear dependence in the discovery of differentially expressed genes.
Deller JR Jr, Radha H, McCormick JJ, Wang H. Deller JR Jr, et al. ISRN Bioinform. 2012 Apr 12;2012:564715. doi: 10.5402/2012/564715. eCollection 2012. ISRN Bioinform. 2012. PMID: 25937940 Free PMC article.
Instance-based concept learning from multiclass DNA microarray data.
Berrar D, Bradbury I, Dubitzky W. Berrar D, et al. BMC Bioinformatics. 2006 Feb 16;7:73. doi: 10.1186/1471-2105-7-73. BMC Bioinformatics. 2006. PMID: 16483361 Free PMC article.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- Elsevier Science
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multi-class clustering and prediction in the analysis of microarray data

Affiliation

Multi-class clustering and prediction in the analysis of microarray data

Authors

Affiliation

Abstract

Similar articles

Cited by

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Research Materials