Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 Nov 11;100(23):13167-72.
doi: 10.1073/pnas.1733249100. Epub 2003 Oct 27.

Robust singular value decomposition analysis of microarray data

Affiliations

Robust singular value decomposition analysis of microarray data

Li Liu et al. Proc Natl Acad Sci U S A. .

Abstract

In microarray data there are a number of biological samples, each assessed for the level of gene expression for a typically large number of genes. There is a need to examine these data with statistical techniques to help discern possible patterns in the data. Our technique applies a combination of mathematical and statistical methods to progressively take the data set apart so that different aspects can be examined for both general patterns and very specific effects. Unfortunately, these data tables are often corrupted with extreme values (outliers), missing values, and non-normal distributions that preclude standard analysis. We develop a robust analysis method to address these problems. The benefits of this robust analysis will be both the understanding of large-scale shifts in gene effects and the isolation of particular sample-by-gene effects that might be either unusual interactions or the result of experimental flaws. Our method requires a single pass and does not resort to complex "cleaning" or imputation of the data table before analysis. We illustrate the method with a commercial data set.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
(A) The unordered gene expression data matrix of set A. The rows correspond to the cell lines, and the columns correspond to the genes. In this image, we cannot see any clear patterns. (B) Outliers identified in the image of the residuals. The outliers are yellow (higher than expected) or blue (lower than expected). To be able to view the figure clearly, we selected only 60 columns (genes) to illustrate here.
Fig. 2.
Fig. 2.
(A) The first rSVD component from set A. Looking at the names of the ordered samples (rows) shows clear separation of liver and kidney samples from the other seven tissues. (B) The second rSVD component from set A. This shows a separation of prostate and colon from other tissues. Here, liver(n) represents normal liver samples, liver(m) represents malignant liver samples, colon(n) represents normal colon samples, and colon(m) represents malignant colon samples.
Fig. 3.
Fig. 3.
(A) The plot of the eigenvalues for set A. This plot suggests that we keep the first two components. (B) The quantile-quantile plot of the residuals for set A. As shown, the residuals follow a heavy tailed distribution.

Similar articles

Cited by

References

    1. Healy, M. J. R. (1986) Matrices for Statisticians (Clarendon, Oxford), pp. 64-66.
    1. Gabriel, K. R. & Zamir, S. (1979) Technometrics 21, 489-498.
    1. Croux, C., Filzmoser, P., Pison, G. & Rousseeum, P. J. (2002) Stat. Comput. 13, 23-36.
    1. Ukkelberg, A. & Borgen, O. (1993) Anal. Chim. Acta 277, 489-494.
    1. Venter, J. H. & Steel, S. J. (1996) Comput. Stat. Data Anal. 22, 481-504.