Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Dec 20:7:319.
doi: 10.1186/1471-2164-7-319.

Systematic interpretation of microarray data using experiment annotations

Affiliations

Systematic interpretation of microarray data using experiment annotations

Kurt Fellenberg et al. BMC Genomics. .

Abstract

Background: Up to now, microarray data are mostly assessed in context with only one or few parameters characterizing the experimental conditions under study. More explicit experiment annotations, however, are highly useful for interpreting microarray data, when available in a statistically accessible format.

Results: We provide means to preprocess these additional data, and to extract relevant traits corresponding to the transcription patterns under study. We found correspondence analysis particularly well-suited for mapping such extracted traits. It visualizes associations both among and between the traits, the hereby annotated experiments, and the genes, revealing how they are all interrelated. Here, we apply our methods to the systematic interpretation of radioactive (single channel) and two-channel data, stemming from model organisms such as yeast and drosophila up to complex human cancer samples. Inclusion of technical parameters allows for identification of artifacts and flaws in experimental design.

Conclusion: Biological and clinical traits can act as landmarks in transcription space, systematically mapping the variance of large datasets from the predominant changes down toward intricate details.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Experiment annotations, annotation values, and measurements. A set of experiment annotations (left upper box) is assessed describing an experimental context, here a microarray study of pancreas cancer samples. Each experiment annotation (e. g. tumor type) can take several annotation values (right upper box). Each of these annotation values (e. g. serous cystadenoma) annotates a set of microarray measurements. Each measurement is affiliated to exactly one annotation value per experiment annotation. In this manner, each experiment annotation represents a possible grouping of the measurements, with each experiment annotation value representing a distinct group. If this grouping corresponds to the transcription patterns observed, the experiment annotation is relevant for the experimental context under study.
Figure 2
Figure 2
Overview of the pancreas cancer data. Human biopsies are characterized in terms of 26 (out of 93) experiment annotation values that have been selected for reproducibly corresponding to major variances in transcription. These traits have been subdevided into four different clusters (red, blue, pink, and green) by cutting the hierarchical clustering tree (panel c) at less than 20% of their total variance. Thickness of lines in the clustering tree corresponds to numbers of hybridizations annotated with at least one of the traits of the according cluster. The thickness of the horizontal yellow lines corresponds to the number of measurements posessing the listed trait, or, in case of a feature-cluster, at least one of the comprised traits, but none of the traits of the cluster to merge with next. A grey line stands for the empty set, indicating that, in terms of annotated measurements e. g. of the green cluster, this cluster is completely included in the cluster to merge with. The thickness of the vertical line indicates the cardinality of the intersection (number of measurements having at least one trait out of either cluster). Whereas the line thickness is proportional to the number of measurements relative to the total number of measurements in the dataset, the percentages written next to the vertical lines denote the cardinality of an intersection relative to the cardinality of the union of the particular two clusters to merge, only. The annotation values are also shown by CA (panel a), genes being plotted as grey dots, traits as boxes color-coded as above. The plot reveals that the difference between the first two and the second two clusters corresponds to many differential genes and makes up to 75% of the total variance among the traits (panel b).
Figure 3
Figure 3
Trait-cluster ranges. The cluster centroids of the experiment annotation values of Fig. 2 have been projected by CA, the first two principal axes explaining almost the entire variance among these (upper right corner). Genes are depicted as grey dots, hybridization measurements (plotted without mass) as grey empty boxes, cluster centroids as filled boxes, color coded as in Fig. 2 Around each centroid, a circle incloses 80% of the measurements annotated with at least one of the traits belonging to the particular cluster. Lines to the cluster centroids in standard coordinates [23] indicate the direction of highest association with a certain cluster for the genes. Some of these are encirceled in black, tagged by a gene name and further referred to in the text.
Figure 4
Figure 4
Alcohol consumption. The annotation values of experiment annotation alcohol consumption' (solid boxes) have been projected by CA. Elements are drawn as in Fig. 3. Unlike in Fig. 3, measurements (empty boxes) are color-coded. Each measurement corresponds to only one annotation value, because the map is limited to one experiment annotation. We reversed the direction of the abscissa to maintain the orientation of tumors versus healthy tissues of the previous figures. We changed the color-code, however, to acknoledge the fact that present alcohol consumption alone does not represent the entire pink trait cluster of Fig. 2, for example.
Figure 5
Figure 5
Systematic interpretation. For systematic interpretation of large datasets, the comprised variance can be divisively split up in a "top-down" "by-trait" assessment shown in panel a: After regarding the variance between a small number of trait clusters (Figs. 2 and 3), the variance within each cluster is analyzed separately in the same way (Figs. 6 and 7 in the Additional file 1) until analyses consist of single traits. Thus, the top-down approach proceeds from the predominant variance to more subtle changes, answering the question which traits are different and which similar transcriptionally. In contrast, an agglomerative "bottom-up" approach will focus on few traits initially (panel b). These may stem from a single annotation (aspect, parameter) of special interest (e. g. alcohol consumption, Fig. 4). In further steps, the most interesting annotation (not necessarily representing large variance) is combined with other aspects to visualize their interaction.

Similar articles

Cited by

References

    1. Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/projects/geo
    1. ArrayExpress http://www.ebi.ac.uk/arrayexpress
    1. Microarray Gene Expression Data Society http://www.mged.org
    1. Bassett D, Jr, Eisen M, Boguski M. Gene expression informatics–it's all in your mine. Nat Genet. 1999;21:51–5. doi: 10.1038/4478. - DOI - PubMed
    1. Fellenberg K, Hauser N, Brors B, Hoheisel J, Vingron M. Microarray data warehouse allowing for inclusion of experiment annotations in statistical analysis. Bioinformatics. 2002;18:423–33. doi: 10.1093/bioinformatics/18.3.423. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources