Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jun 15;25(12):i145-53.
doi: 10.1093/bioinformatics/btp215.

Probabilistic retrieval and visualization of biologically relevant microarray experiments

Affiliations

Probabilistic retrieval and visualization of biologically relevant microarray experiments

José Caldas et al. Bioinformatics. .

Abstract

Motivation: As ArrayExpress and other repositories of genome-wide experiments are reaching a mature size, it is becoming more meaningful to search for related experiments, given a particular study. We introduce methods that allow for the search to be based upon measurement data, instead of the more customary annotation data. The goal is to retrieve experiments in which the same biological processes are activated. This can be due either to experiments targeting the same biological question, or to as yet unknown relationships.

Results: We use a combination of existing and new probabilistic machine learning techniques to extract information about the biological processes differentially activated in each experiment, to retrieve earlier experiments where the same processes are activated and to visualize and interpret the retrieval results. Case studies on a subset of ArrayExpress show that, with a sufficient amount of data, our method indeed finds experiments relevant to particular biological questions. Results can be interpreted in terms of biological processes using the visualization techniques.

Availability: The code is available from http://www.cis.hut.fi/projects/mi/software/ismb09.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Visualization of the topic model. A subset of 13 topics, 211 gene sets and 105 experiments is shown. For details and a discussion see the text.
Fig. 2.
Fig. 2.
The experiment collection visualized as glyphs on a plane. Topic colors in all glyphs match topic colors in Figure 1. (A) NeRV projection of the 105 experiments, each shown as a glyph. (B) The slices of each glyph show the distribution of topics in the experiment. The experiment labels are from left to right: asthma, Barrett's esophagus and high-stage neuroblastoma. (C) Enlarged region from (A) where glyphs have additionally been scaled according to their relevance to the query with the ‘malignant melanoma’ experiment shown in the center. A detailed description of this experiment is included in Section 3.
Fig. 3.
Fig. 3.
(A) Average Precision for cancer queries for the top 10 results. Queries are sorted by the average precision given by the topic model. Error bars represent the 99% confidence interval of the random permutation results. (B) Interpolated average precision at 11 standard recall levels (given as percentages). The solid line corresponds to our method; the dashed line corresponds to the baseline.
Fig. 4.
Fig. 4.
NeRV projection of the 105 experiments, portraying the outcome of querying the model with a melanoma experiment. Both glyph size and color saturation encode the relevance of each experiment to the query. The bigger the glyph and the more saturated the red the higher the relevance of the experiment to the query. The query itself is represented by the biggest glyph.

References

    1. Au WY, et al. Cough mixture abuse, folate deficiency and acute lymphoblastic leukemia. Leukemia Res. 2009;33:508–509. - PubMed
    1. Blei D, Lafferty J. A correlated topic model of science. Ann. Appl. Stat. 2007;1:17–35.
    1. Blei D, et al. Hierarchical topic models and the nested Chinese restaurant process. In: Thrun LSS, Schölkopf B, editors. NIPS 16. Cambridge, MA: MIT Press; 2003.
    1. Blei D, et al. Latent Dirichlet allocation. J. Mach. Learn. Res. 2003;3:993–1022.
    1. Buntine W, Jakulin A. Applying discrete PCA in data analysis. In: Chickering DM, Halpern JY, editors. UAI'04. Arlington, Virginia: AUAI Press; 2004. pp. 59–66.

Publication types

MeSH terms