Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2007 Fall;5(3):161-75.
doi: 10.1007/s12021-007-0012-5.

Sharing and reusing gene expression profiling data in neuroscience

Affiliations
Review

Sharing and reusing gene expression profiling data in neuroscience

Xiang Wan et al. Neuroinformatics. 2007 Fall.

Abstract

As public availability of gene expression profiling data increases, it is natural to ask how these data can be used by neuroscientists. Here we review the public availability of high-throughput expression data in neuroscience and how it has been reused, and tools that have been developed to facilitate reuse. There is increasing interest in making expression data reuse a routine part of the neuroscience tool-kit, but there are a number of challenges. Data must become more readily available in public databases; efforts to encourage investigators to make data available are important, as is education on the benefits of public data release. Once released, data must be better-annotated. Techniques and tools for data reuse are also in need of improvement. Integration of expression profiling data with neuroscience-specific resources such as anatomical atlases will further increase the value of expression data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Conceptualization of data selection for re-use. Criteria that are too stringent or too lax make comparisons difficult.
Figure 2
Figure 2
Trends in publications on expression profiling. We searched PubMed for entries using the search criteria “Gene Expression Profiling $M $Y[publication date]”, where $M was either “cancer” or “brain” and $Y was a year (1998–2006); or the total number of PubMed entries by year. A. Raw numbers showing that profiling papers accessible with the keyword “cancer” were consistently much more numerous than for “brain”. B. Data normalized by the number of publications in 2006, showing the similarity of the growth curves. Data for all PubMed entries are shown for comparison: submissions about profiling outpace the growth of PubMed by a wide margin.
Figure 3
Figure 3
Trends in submissions to GEO. We used the GEO web interface to identify experiment series submissions in each year, using the same keywords that were used for the PubMed analysis in Figure 3 (e.g., “GSE[Entry Type] AND 2002[Publication Date] AND cancer”). Values are expressed as the fraction of all GEO series submissions. The growth curve of GEO overall is shown in arbitrary units for comparison. Submissions with the keyword “brain” follow a similar trend to “cancer” but with consistently smaller numbers of submissions.
Figure 4
Figure 4
Re-analysis of mouse brain data from Sandberg et al. (2000). An example of how re-analysis of existing data can uncover previously unrecognized patterns. Pavlidis et al. (2001) identified genes showing brain-regionalization of expression using analysis of variance, in this case in the midbrain compared to five other regions, in two mouse strains. A comparison to the existing analysis showed that only a subset of these genes had been identified (marked by bullets). The heatmap shows relative expression levels, where white represents higher levels. Reproduced from Pavlidis et al. (2001) with permission.
Figure 5
Figure 5
Recurring expression patterns yield higher-quality functional inferences. Each curve is a cumulative distribution of gene similarity as reflected in shared GO terms. Curves further to the right represent coexpression patterns that reoccur in more data sets, and exhibit higher functional similarity. Redrawn using data from (Lee et al., 2004).
Figure 6
Figure 6
Screen shot of coexpression results from Gemma. Users enter a query gene of interest (PARK7 in this case) and are provided with a list of genes that are reproducibly coexpressed in multiple studies. The “support” is the number of data sets in which the pattern is found. “GO overlap” reflects the existing state of knowledge about the relatedness of the query gene to the result gene. The “Exps” column illustrates which data sets, of those searched, provide the support. A black line is shown for each data set where the pattern is found, giving a visual cue to which data sets are contributing most to the overall result. In this example, 57 data sets were searched, yielding 1788 patterns involving PARK7. Of these, nineteen positive and two negative correlation patterns were reproduced in at least 3 of the data sets.

Comment in

Similar articles

Cited by

References

    1. Aarnio V, Paananen J, Wong G. Analysis of microarray studies performed in the neurosciences. J Mol Neurosci. 2005;27:261–268. - PubMed
    1. Assou S, Le Carrour T, Tondeur S, Strom S, Gabelle A, Marty S, Nadal L, Pantesco V, Reme T, Hugnot JP, et al. A meta-analysis of human embryonic stem cells transcriptome integrated into a web-based expression atlas. Stem cells (Dayton, Ohio) 2007;25:961–973. - PMC - PubMed
    1. Barnes M, Freudenberg J, Thompson S, Aronow B, Pavlidis P. Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms. Nucleic acids research. 2005;33:5914–5923. - PMC - PubMed
    1. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Edgar R. NCBI GEO: mining tens of millions of expression profiles--database and tools update. Nucleic acids research. 2007;35:D760–D765. - PMC - PubMed
    1. Berman H, Henrick K, Nakamura H, Markley JL. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucl. Acids Res. 2007;35:D301–D303. - PMC - PubMed

Publication types