. 2003;4(12):R83.

doi: 10.1186/gb-2003-4-12-r83. Epub 2003 Nov 24.

Multiclass classification of microarray data with repeated measurements: application to cancer

Ka Yee Yeung¹, Roger E Bumgarner

Affiliations

PMID: 14659020
PMCID: PMC329422
DOI: 10.1186/gb-2003-4-12-r83

Multiclass classification of microarray data with repeated measurements: application to cancer

Ka Yee Yeung et al. Genome Biol. 2003.

. 2003;4(12):R83.

doi: 10.1186/gb-2003-4-12-r83. Epub 2003 Nov 24.

Authors

Ka Yee Yeung¹, Roger E Bumgarner

Affiliation

¹ Department of Microbiology, Box 358070, University of Washington, Seattle, WA 98195, USA. kayee@u.washington.edu

PMID: 14659020
PMCID: PMC329422
DOI: 10.1186/gb-2003-4-12-r83

Erratum in

Genome Biol. 2005;6(13):405-405.4

Abstract

Prediction of the diagnostic category of a tissue sample from its gene-expression profile and selection of relevant genes for class prediction have important applications in cancer research. We have developed the uncorrelated shrunken centroid (USC) and error-weighted, uncorrelated shrunken centroid (EWUSC) algorithms that are applicable to microarray data with any number of classes. We show that removing highly correlated genes typically improves classification results using a small set of genes.

PubMed Disclaimer

Figures

**Figure 1**
Comparison of prediction accuracy of USC and SC on the NCI 60 data. The percentage of prediction accuracy is plotted against the number of relevant genes using the USC algorithm at ρ₀= 0.6 and the SC algorithm (USC at ρ₀= 1.0). The horizontal axis is shown on a log scale. Because no independent test set is available for this data, we randomly divided the samples in each class into roughly three parts multiple times, such that a third of the samples are reserved as a test set. Thus the training set consists of 43 samples and the test set of 18 samples. The graph represents typical results over these multiple random runs.

**Figure 2**
Prediction accuracy on the multiple tumor data using the EWUSC algorithm over the range of Δ from 0 to 20. The percentage of classification errors is plotted against Δ on **(a)** the full training set (96 samples) and **(c)** the test set (27 samples). In **(b)** the average percentage of errors is plotted against Δ on the cross-validation data over five random runs of fourfold cross-validation. In **(d)**, the number of relevant genes is plotted against Δ. Different colors are used to specify different correlation thresholds (ρ₀= 0.6, 0.7, 0.8, 0.9 or 1). Results of ρ₀< 0.6 are shown in Figure S1 on [30]. Optimal parameters are inferred from the cross-validation data in (b).

**Figure 3**
Comparison of feature stability of EWUSC, USC and SC on the multiple tumor data. The average Jaccard index is plotted against the number of relevant genes over five random runs of fourfold cross-validation using EWUSC and USC at ρ₀= 0.8 and SC. A high average Jaccard index indicates high feature stability. The EWUSC algorithm selects the most stable features. Note that the horizontal axis is shown on a log scale.

**Figure 4**
Comparison of prediction accuracy of EWUSC, USC, SVM and SC algorithms on the multiple tumor data. The horizontal axis shows the total number of distinct genes selected over all binary SVM classifiers on a log scale. Some results are not available on the full range of the total number of genes. For example, the maximum numbers of selected genes for EWUSC and USC are roughly 1,000. The reported prediction accuracy is 78% [10] using all 16,000 available genes on the full data. The EWUSC algorithm achieves 89% prediction accuracy with only 89 genes. With 680 genes, EWUSC produces 93% prediction accuracy.

**Figure 5**
Comparison of prediction accuracy of EWUSC, USC and SC on the breast cancer data. The percentage of prediction accuracy is plotted against the number of relevant genes using the EWUSC algorithm at ρ₀= 0.7, the USC algorithm at ρ₀= 0.6 and the SC algorithm (USC at ρ₀= 1.0). Note that the horizontal axis is shown on a log scale.

**Figure 6**
Comparison of feature stability of EWUSC, USC and SC on the breast cancer data. The average Jaccard index is plotted against the number of relevant genes over five random runs of 10-fold cross-validation using the EWUSC algorithm at ρ₀= 0.7, the USC algorithm at ρ₀= 0.6 and the SC algorithm (USC at ρ₀= 1). The EWUSC algorithm produces relatively more stable features when the number of relevant genes is small.

See this image and copyright information in PMC

Cited by

Establishing Novel Molecular Subtypes of Appendiceal Cancer.
Garland-Kledzik M, Scholer A, Ensenyat-Mendez M, Orozco JIJ, Khader A, Santamaria-Barria J, Fischer T, Pigazzi A, Marzese DM. Garland-Kledzik M, et al. Ann Surg Oncol. 2022 Mar;29(3):2118-2125. doi: 10.1245/s10434-021-10945-8. Epub 2021 Oct 30. Ann Surg Oncol. 2022. PMID: 34718915
Developing a Predictive Gene Classifier for Autism Spectrum Disorders Based upon Differential Gene Expression Profiles of Phenotypic Subgroups.
Hu VW, Lai Y. Hu VW, et al. N Am J Med Sci (Boston). 2013;6(3):10.7156/najms.2013.0603107. doi: 10.7156/najms.2013.0603107. N Am J Med Sci (Boston). 2013. PMID: 24363828 Free PMC article.
Gene selection and classification of microarray data using random forest.
Díaz-Uriarte R, Alvarez de Andrés S. Díaz-Uriarte R, et al. BMC Bioinformatics. 2006 Jan 6;7:3. doi: 10.1186/1471-2105-7-3. BMC Bioinformatics. 2006. PMID: 16398926 Free PMC article.
Optimized between-group classification: a new jackknife-based gene selection procedure for genome-wide expression data.
Baty F, Bihl MP, Perrière G, Culhane AC, Brutsche MH. Baty F, et al. BMC Bioinformatics. 2005 Sep 28;6:239. doi: 10.1186/1471-2105-6-239. BMC Bioinformatics. 2005. PMID: 16191195 Free PMC article.
A Dual Level Analysis with Evolutionary Computing and Swarm Models for Classification of Leukemia.
Prabhakar SK, Ryu S, Jeong IC, Won DO. Prabhakar SK, et al. Biomed Res Int. 2022 May 26;2022:2052061. doi: 10.1155/2022/2052061. eCollection 2022. Biomed Res Int. 2022. PMID: 35663047 Free PMC article.

See all "Cited by" articles

References

1. Lennon GG, Lehrach H. Hybridization analyses of arrayed cDNA libraries. Trends Genet. 1991;7:314–317. - PubMed
1. Nguyen C, Rocha D, Granjeaud S, Baldit M, Bernard K, Naquet P, Jordan BR. Differential gene expression in the murine thymus assayed by quantitative hybridization of arrayed cDNA clones. Genomics. 1995;29:207–216. doi: 10.1006/geno.1995.1233. - DOI - PubMed
1. Pietu G, Alibert O, Guichard V, Lamy B, Bois F, Leroy E, Mariage-Sampson R, Houlgatte R, Soularue P, Auffray C. Novel gene transcripts preferentially expressed in human muscles revealed by quantitative hybridization of a high density cDNA array. Genome Res. 1996;6:492–503. - PubMed
1. Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995;270:467–470. - PubMed
1. Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H, et al. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol. 1996;14:1675–1680. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

5U24DK058813-02/DK/NIDDK NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multiclass classification of microarray data with repeated measurements: application to cancer

Affiliation

Multiclass classification of microarray data with repeated measurements: application to cancer

Authors

Affiliation

Erratum in

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Erratum in

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources