Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Sep 8:11:452.
doi: 10.1186/1471-2105-11-452.

Simple and flexible classification of gene expression microarrays via Swirls and Ripples

Affiliations

Simple and flexible classification of gene expression microarrays via Swirls and Ripples

Stuart G Baker. BMC Bioinformatics. .

Abstract

Background: A simple classification rule with few genes and parameters is desirable when applying a classification rule to new data. One popular simple classification rule, diagonal discriminant analysis, yields linear or curved classification boundaries, called Ripples, that are optimal when gene expression levels are normally distributed with the appropriate variance, but may yield poor classification in other situations.

Results: A simple modification of diagonal discriminant analysis yields smooth highly nonlinear classification boundaries, called Swirls, that sometimes outperforms Ripples. In particular, if the data are normally distributed with different variances in each class, Swirls substantially outperforms Ripples when using a pooled variance to reduce the number of parameters. The proposed classification rule for two classes selects either Swirls or Ripples after parsimoniously selecting the number of genes and distance measures. Applications to five cancer microarray data sets identified predictive genes related to the tissue organization theory of carcinogenesis.

Conclusion: The parsimonious selection of classifiers coupled with the selection of either Swirls or Ripples provides a good basis for formulating a simple, yet flexible, classification rule. Open source software is available for download.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustrative classification boundaries for two genes. The points are the centroids. Vertical and horizontal lines at the centroid are proportional to the variances. Distance measures are D = 1 = pooled variance and D = 2 = class-specific variance.
Figure 2
Figure 2
Swirls and Ripples applied to data generated with D = 2.
Figure 3
Figure 3
ROC and RU curves for simulation.
Figure 4
Figure 4
ROC and RU curves for data sets.

References

    1. Hand DJ. Classifier technology and the illusion of progress. Stat Sci. 2006;21:1–14. doi: 10.1214/088342306000000060. - DOI
    1. Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc. 2002;97:77–87. doi: 10.1198/016214502753479248. - DOI
    1. Lachenbruch PA, Goldstein M. Discriminant analysis. Biometrics. 1979;35:69–85. doi: 10.2307/2529937. - DOI
    1. Stekel D. Microarray Bioinformatics. Cambridge: Cambridge University Press; 2003.
    1. Hand DJ. In: Encylopedia of Biostatistics. Peter Armitage and Theodore Colton, editor. Vol. 2. Chichester: John Wiley and Sons; 1998. Discriminant analysis, linear; pp. 1168–1179.

Publication types

MeSH terms

LinkOut - more resources