Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003;4(5):R34.
doi: 10.1186/gb-2003-4-5-r34. Epub 2003 Apr 25.

Clustering gene-expression data with repeated measurements

Affiliations

Clustering gene-expression data with repeated measurements

Ka Yee Yeung et al. Genome Biol. 2003.

Abstract

Clustering is a common methodology for the analysis of array data, and many research laboratories are generating array data with repeated measurements. We evaluated several clustering algorithms that incorporate repeated measurements, and show that algorithms that take advantage of repeated measurements yield more accurate and more stable clusters. In particular, we show that the infinite mixture model-based approach with a built-in error model produces superior results.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Expression profiles of the classes in typical completely synthetic datasets with four repeated measurements. (a) Low noise level; (b) high noise level. For each class, the log ratios are plotted against the experiment numbers, and each class is shown in a different color. There are four sine (periodic) classes with different phase shifts and two linear (non-periodic) classes. Only four (out of six) classes are shown in (b) for clarity.
Figure 2
Figure 2
Distribution of the fraction of gene pairs against correlation coefficient. Correlation coefficients are computed from averaging over repeated measurements and using SD over repeated measurements as weights on the yeast galactose data. There are more gene pairs with correlation coefficients around 0 and fewer gene pairs with correlation coefficients near 1 when SD-weighted correlation is used.
Figure 3
Figure 3
Distribution of the fraction of gene pairs against Euclidean distance. Euclidean distances are computed from averaging over repeated measurements and using SD over repeated measurements as weights on the yeast galactose data. There are more gene pairs with Euclidean distances near 0 when SD-weighted distance is used.
Figure 4
Figure 4
Distribution of error plotted against intensity. The SDs over the log ratios from repeated measurements are plotted against the average intensities over repeated measurements in a typical experiment on the yeast galactose data.

References

    1. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998;95:14863–14868. - PMC - PubMed
    1. Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, Bennett HA, Coffey E, Dai H, He YD, et al. Functional discovery via a compendium of expression profiles. Cell. 2000;102:109–126. - PubMed
    1. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–537. - PubMed
    1. van't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415:530–536. - PubMed
    1. Hartigan JA. Clustering Algorithms. New York: John Wiley and Sons; 1975.

Publication types

MeSH terms

LinkOut - more resources