Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2002;3(12):RESEARCH0069.
doi: 10.1186/gb-2002-3-12-research0069. Epub 2002 Nov 25.

Supervised clustering of genes

Affiliations
Comparative Study

Supervised clustering of genes

Marcel Dettling et al. Genome Biol. 2002.

Abstract

Background: We focus on microarray data where experiments monitor gene expression in different tissues and where each experiment is equipped with an additional response variable such as a cancer type. Although the number of measured genes is in the thousands, it is assumed that only a few marker components of gene subsets determine the type of a tissue. Here we present a new method for finding such groups of genes by directly incorporating the response variables into the grouping process, yielding a supervised clustering algorithm for genes.

Results: An empirical study on eight publicly available microarray datasets shows that our algorithm identifies gene clusters with excellent predictive potential, often superior to classification with state-of-the-art methods based on single genes. Permutation tests and bootstrapping provide evidence that the output is reasonably stable and more than a noise artifact.

Conclusions: In contrast to other methods such as hierarchical clustering, our algorithm identifies several gene clusters whose expression levels clearly distinguish the different tissue types. The identification of such gene clusters is potentially useful for medical diagnostics and may at the same time reveal insights into functional genomics.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Lymphoma data. Average cluster expression formula image shaped for the separation of response class 1 (FL), versus response classes 0 and 2 (DLBCL and CLL) on the x-axis, and formula image formed for discrimination of class 2 versus classes 0 and 1 on the y-axis.
Figure 2
Figure 2
Histograms showing the empirical distribution of scores (left) and margins (right) for the leukemia dataset (AML/ALL distinction), based on 1,000 bootstrap replicates with permuted response variables. The dashed vertical lines mark the values of score and margin with the original response variables.

References

    1. Weinstein J, Myers T, O'Connor P, Friend H, Fornace A, Jr, Kohn K, Fojo T, Bates S, Rubinstein L, Anderson N, et al. An information-intensive approach to the molecular pharmacology of cancer. Science. 1997;275:343–349. - PubMed
    1. Eisen M, Spellman P, Brown P, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998;95:14863–14868. - PMC - PubMed
    1. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander E, Golub T. Interpreting patterns of gene expression with self-organizing-maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA. 1999;96:2907–2912. - PMC - PubMed
    1. Ben-Dor A, Shamir R, Yakhini Z. Clustering gene expression patterns. J Comput Biol. 1999;6:281–297. - PubMed
    1. Hastie T, Tibshirani R, Botstein D, Brown P. Supervised harvesting of expression trees. Genome Biol. 2001;2:research0003.1–0003.12. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources