Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2007 Apr 3;104(14):5959-64.
doi: 10.1073/pnas.0701068104. Epub 2007 Mar 27.

Metagene projection for cross-platform, cross-species characterization of global transcriptional states

Affiliations
Comparative Study

Metagene projection for cross-platform, cross-species characterization of global transcriptional states

Pablo Tamayo et al. Proc Natl Acad Sci U S A. .

Abstract

The high dimensionality of global transcription profiles, the expression level of 20,000 genes in a much small number of samples, presents challenges that affect the sensitivity and general applicability of analysis results. In principle, it would be better to describe the data in terms of a small number of metagenes, positive linear combinations of genes, which could reduce noise while still capturing the invariant biological features of the data. Here, we describe how to accomplish such a reduction in dimension by a metagene projection methodology, which can greatly reduce the number of features used to characterize microarray data. We show, in applications to the analysis of leukemia and lung cancer data sets, how this approach can help assess and interpret similarities and differences between independent data sets, enable cross-platform and cross-species analysis, improve clustering and class prediction, and provide a computational means to detect and remove sample contamination.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Schematic of the metagene projection methodology.
Fig. 2.
Fig. 2.
Heat maps of metagene projection of leukemia samples. These heat maps of the HM and HT matrices show the metagene expression levels for each sample. Each factor clearly corresponds to same leukemia subtype in both model (Left) and test (Right) sets.
Fig. 3.
Fig. 3.
Hierarchical clustering of the leukemia model and test samples. (A) Clustering of the merged test and model data sets after metagene projection, i.e., columns of the merged HM and HT matrices. (B) Clustering of merged model and test sets normalized but without projection. For clarity, some dendrogram vertical lines have been truncated in A; for full dendrograms see SI Fig. 7.
Fig. 4.
Fig. 4.
Leukemia subclasses metagene projection. Heat maps of the model (A) and test (B) sets after metagene projection show consistent representation of subtype structure across technology platform and laboratory group. SI Text contains a detailed description of the different leukemia subtypes shown here.
Fig. 5.
Fig. 5.
Metagene projection of the lung cancer data set. Heat maps showing projection of model and test data sets into four-metagene space F1-F4 (A) and three-metagene space F1-F3 after numerical removal of normal component F4 and reconstruction of model (B). AD, adenocarcinoma; SQ, squamous; C, cell lines; NL, normal lung.

References

    1. Lee DD, Seung HS. Nature. 1999;401:788–791. - PubMed
    1. Lee DD, Seung HS. Adv Neural Info Proc Syst. 2001;13:556–562.
    1. Brunet JP, Tamayo P, Golub TR, Mesirov JP. Proc Natl Acad Sci USA. 2004;101:4164–4169. - PMC - PubMed
    1. Ben-Israel A, Greville TNE. Generalized Inverses: Theory and Applications. New York: Springer; 2003.
    1. Cristianini N, Shawe-Taylor J. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. New York: Cambridge Univ Press; 2000.

Publication types

MeSH terms