Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar;12(3):211-4, 3 p following 214.
doi: 10.1038/nmeth.3249. Epub 2015 Jan 12.

Targeted exploration and analysis of large cross-platform human transcriptomic compendia

Affiliations

Targeted exploration and analysis of large cross-platform human transcriptomic compendia

Qian Zhu et al. Nat Methods. 2015 Mar.

Abstract

We present SEEK (search-based exploration of expression compendia; http://seek.princeton.edu/), a query-based search engine for very large transcriptomic data collections, including thousands of human data sets from many different microarray and high-throughput sequencing platforms. SEEK uses a query-level cross-validation-based algorithm to automatically prioritize data sets relevant to the query and a robust search approach to identify genes, pathways and processes co-regulated with the query. SEEK provides multigene query searching with iterative metadata-based search refinement and extensive visualization-based analysis options.

PubMed Disclaimer

Figures

Figure 1
Figure 1. The SEEK system overview and systematic functional evaluation
(a) The system overview. Users begin by defining a query gene set of interest. SEEK can easily accommodate gene sets as small as 1–2 genes and as large as 100 genes (step 1). The SEEK search engine searches the entire compendium, and returns genes that are co-expressed with the query and the top relevant data sets (steps 2, 3). The web user-interface provides visualizations of gene co-expressions across data sets (step 4), and enables users to iteratively refine their search (Fig. 2) and further analyze the results through condition-specific view (step 5). The latter allows users to check possible associations with the measured outcomes in order to interpret the co-expressed genes (Supplementary Note 3). (b) Gene retrieval evaluations across 995 diverse GO biological process terms, for each of SEEK, MEM, Gene recommender, and meta-data set correlation algorithms (Supplementary Note 1). Queries of diverse sizes (2–20 genes) were selected randomly among each term’s genes to evaluate the precision of retrieving the remaining genes in each term. Individual term performances (Supplementary Data 2) and additional detailed comparative evaluations (Supplementary Figs. 1, 2) are provided.
Figure 2
Figure 2. Search results for the Hedgehog (Hh) query (GLI1, GLI2, PTCH1) and search refinement
(a) Data sets prioritized and genes retrieved for the query in the main result page, expression view. The result is retrieved from the Hh query after a global compendium search. The top ranked data sets (1) and the co-expressed gene list (2) are indicated. Conditions in each data set are hierarchically clustered in real-time according to the expression values of the top genes retrieved from the search (3). The expression heat-map of the genes in one of the data sets is shown in (4). (b) Illustration of the search refinement function. Refine Search enables users to narrow the scope of their search based on a powerful and broad set of selection criteria including tissue, cell-type, or disease categories, platforms, or rank of data sets from initial search (Supplementary Note 3). (c) The final results after limiting the search scope to brain data sets. Brain-specific co-expressions are noted in this case with higher co-expression scores to the query and better groupings of conditions than the initial search. SEEK also has alternative view modes such as co-expression view, and condition-specific view (Supplementary Note 3).

References

    1. Cancer T, Atlas G. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455:1061–1068. - PMC - PubMed
    1. Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30:207–210. - PMC - PubMed
    1. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. U. S. A. 1998;95:14863–14868. - PMC - PubMed
    1. Tanay A, Sharan R, Kupiec M, Shamir R. Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. Proc. Natl. Acad. Sci. U. S. A. 2004;101:2981–2986. - PMC - PubMed
    1. Hibbs Ma, et al. Exploring the functional landscape of gene expression: directed search of large microarray compendia. Bioinformatics. 2007;23:2692–2699. - PubMed

Methods-only references

    1. Ramasamy A, Mondry A, Holmes CC, Altman DG. Key issues in conducting a meta-analysis of gene expression microarray datasets. PLoS Med. 2008;5:e184. - PMC - PubMed
    1. Fisher R. Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika. 1915;10:507–521.
    1. Huttenhower C, et al. Exploring the human genome with functional maps. Genome Res. 2009;19:1093–1106. - PMC - PubMed
    1. Song L, Langfelder P, Horvath S. Comparison of co-expression measures: mutual information, correlation, and model based indices. BMC Bioinformatics. 2012;13:328. - PMC - PubMed
    1. Stuart JM, Segal E, Koller D, Kim SK. A gene-coexpression network for global discovery of conserved genetic modules. Science. 2003;302:249–255. - PubMed

Publication types

LinkOut - more resources