. 2006 Dec 20:7:319.

doi: 10.1186/1471-2164-7-319.

Systematic interpretation of microarray data using experiment annotations

Kurt Fellenberg¹, Christian H Busold, Olaf Witt, Andrea Bauer, Boris Beckmann, Nicole C Hauser, Marcus Frohme, Stefan Winter, Jürgen Dippon, Jörg D Hoheisel

Affiliations

PMID: 17181856
PMCID: PMC1774576
DOI: 10.1186/1471-2164-7-319

Systematic interpretation of microarray data using experiment annotations

Kurt Fellenberg et al. BMC Genomics. 2006.

. 2006 Dec 20:7:319.

doi: 10.1186/1471-2164-7-319.

Authors

Kurt Fellenberg¹, Christian H Busold, Olaf Witt, Andrea Bauer, Boris Beckmann, Nicole C Hauser, Marcus Frohme, Stefan Winter, Jürgen Dippon, Jörg D Hoheisel

Affiliation

¹ Department of Functional Genome Analysis, German Cancer Research Center, PO 101949, D-69009 Heidelberg, Germany. k.fellenberg@dkfz.de

PMID: 17181856
PMCID: PMC1774576
DOI: 10.1186/1471-2164-7-319

Abstract

Background: Up to now, microarray data are mostly assessed in context with only one or few parameters characterizing the experimental conditions under study. More explicit experiment annotations, however, are highly useful for interpreting microarray data, when available in a statistically accessible format.

Results: We provide means to preprocess these additional data, and to extract relevant traits corresponding to the transcription patterns under study. We found correspondence analysis particularly well-suited for mapping such extracted traits. It visualizes associations both among and between the traits, the hereby annotated experiments, and the genes, revealing how they are all interrelated. Here, we apply our methods to the systematic interpretation of radioactive (single channel) and two-channel data, stemming from model organisms such as yeast and drosophila up to complex human cancer samples. Inclusion of technical parameters allows for identification of artifacts and flaws in experimental design.

Conclusion: Biological and clinical traits can act as landmarks in transcription space, systematically mapping the variance of large datasets from the predominant changes down toward intricate details.

PubMed Disclaimer

Figures

**Figure 1**
**Experiment annotations, annotation values, and measurements**. A set of experiment annotations (left upper box) is assessed describing an experimental context, here a microarray study of pancreas cancer samples. Each experiment annotation (e. g. tumor type) can take several annotation values (right upper box). Each of these annotation values (e. g. serous cystadenoma) annotates a set of microarray measurements. Each measurement is affiliated to exactly one annotation value per experiment annotation. In this manner, each experiment annotation represents a possible grouping of the measurements, with each experiment annotation value representing a distinct group. If this grouping corresponds to the transcription patterns observed, the experiment annotation is relevant for the experimental context under study.

**Figure 2**
**Overview of the pancreas cancer data**. Human biopsies are characterized in terms of 26 (out of 93) experiment annotation values that have been selected for reproducibly corresponding to major variances in transcription. These traits have been subdevided into four different clusters (red, blue, pink, and green) by cutting the hierarchical clustering tree (panel c) at less than 20% of their total variance. Thickness of lines in the clustering tree corresponds to numbers of hybridizations annotated with at least one of the traits of the according cluster. The thickness of the horizontal yellow lines corresponds to the number of measurements posessing the listed trait, or, in case of a feature-cluster, at least one of the comprised traits, but none of the traits of the cluster to merge with next. A grey line stands for the empty set, indicating that, in terms of annotated measurements e. g. of the green cluster, this cluster is completely included in the cluster to merge with. The thickness of the vertical line indicates the cardinality of the intersection (number of measurements having at least one trait out of either cluster). Whereas the line thickness is proportional to the number of measurements relative to the total number of measurements in the dataset, the percentages written next to the vertical lines denote the cardinality of an intersection relative to the cardinality of the union of the particular two clusters to merge, only. The annotation values are also shown by CA (panel a), genes being plotted as grey dots, traits as boxes color-coded as above. The plot reveals that the difference between the first two and the second two clusters corresponds to many differential genes and makes up to 75% of the total variance among the traits (panel b).

**Figure 3**
**Trait-cluster ranges**. The cluster centroids of the experiment annotation values of Fig. 2 have been projected by CA, the first two principal axes explaining almost the entire variance among these (upper right corner). Genes are depicted as grey dots, hybridization measurements (plotted without mass) as grey empty boxes, cluster centroids as filled boxes, color coded as in Fig. 2 Around each centroid, a circle incloses 80% of the measurements annotated with at least one of the traits belonging to the particular cluster. Lines to the cluster centroids in standard coordinates [23] indicate the direction of highest association with a certain cluster for the genes. Some of these are encirceled in black, tagged by a gene name and further referred to in the text.

**Figure 4**
**Alcohol consumption**. The annotation values of experiment annotation alcohol consumption' (solid boxes) have been projected by CA. Elements are drawn as in Fig. 3. Unlike in Fig. 3, measurements (empty boxes) are color-coded. Each measurement corresponds to only one annotation value, because the map is limited to one experiment annotation. We reversed the direction of the abscissa to maintain the orientation of tumors versus healthy tissues of the previous figures. We changed the color-code, however, to acknoledge the fact that present alcohol consumption alone does not represent the entire pink trait cluster of Fig. 2, for example.

**Figure 5**
**Systematic interpretation**. For systematic interpretation of large datasets, the comprised variance can be divisively split up in a "top-down" "by-trait" assessment shown in panel a: After regarding the variance between a small number of trait clusters (Figs. 2 and 3), the variance within each cluster is analyzed separately in the same way (Figs. 6 and 7 in the Additional file 1) until analyses consist of single traits. Thus, the top-down approach proceeds from the predominant variance to more subtle changes, answering the question which traits are different and which similar transcriptionally. In contrast, an agglomerative "bottom-up" approach will focus on few traits initially (panel b). These may stem from a single annotation (aspect, parameter) of special interest (e. g. alcohol consumption, Fig. 4). In further steps, the most interesting annotation (not necessarily representing large variance) is combined with other aspects to visualize their interaction.

See this image and copyright information in PMC

Cited by

Dual-color proteomic profiling of complex samples with a microarray of 810 cancer-related antibodies.
Schröder C, Jacob A, Tonack S, Radon TP, Sill M, Zucknick M, Rüffer S, Costello E, Neoptolemos JP, Crnogorac-Jurcevic T, Bauer A, Fellenberg K, Hoheisel JD. Schröder C, et al. Mol Cell Proteomics. 2010 Jun;9(6):1271-80. doi: 10.1074/mcp.M900419-MCP200. Epub 2010 Feb 16. Mol Cell Proteomics. 2010. PMID: 20164060 Free PMC article.
Identification of differentially expressed subnetworks based on multivariate ANOVA.
Hwang T, Park T. Hwang T, et al. BMC Bioinformatics. 2009 Apr 30;10:128. doi: 10.1186/1471-2105-10-128. BMC Bioinformatics. 2009. PMID: 19405941 Free PMC article.
Exploring the transcription factor activity in high-throughput gene expression data using RLQ analysis.
Baty F, Rüdiger J, Miglino N, Kern L, Borger P, Brutsche M. Baty F, et al. BMC Bioinformatics. 2013 Jun 6;14:178. doi: 10.1186/1471-2105-14-178. BMC Bioinformatics. 2013. PMID: 23742070 Free PMC article.
Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments.
Celton M, Malpertuy A, Lelandais G, de Brevern AG. Celton M, et al. BMC Genomics. 2010 Jan 7;11:15. doi: 10.1186/1471-2164-11-15. BMC Genomics. 2010. PMID: 20056002 Free PMC article.

References

1. Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/projects/geo
1. ArrayExpress http://www.ebi.ac.uk/arrayexpress
1. Microarray Gene Expression Data Society http://www.mged.org
1. Bassett D, Jr, Eisen M, Boguski M. Gene expression informatics–it's all in your mine. Nat Genet. 1999;21:51–5. doi: 10.1038/4478. - DOI - PubMed
1. Fellenberg K, Hauser N, Brors B, Hoheisel J, Vingron M. Microarray data warehouse allowing for inclusion of experiment annotations in statistical analysis. Bioinformatics. 2002;18:423–33. doi: 10.1093/bioinformatics/18.3.423. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Systematic interpretation of microarray data using experiment annotations

Affiliation

Systematic interpretation of microarray data using experiment annotations

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources