Review

. 2007 Fall;5(3):161-75.

doi: 10.1007/s12021-007-0012-5.

Sharing and reusing gene expression profiling data in neuroscience

Xiang Wan¹, Paul Pavlidis

Affiliations

Affiliation

¹ Department of Psychiatry, UBC Bioinformatics Centre, University of British Columbia, 177 Michael Smith Laboratories 2185 East Mall, Vancouver, BC V6T1Z4, Canada.

PMID: 17917127
PMCID: PMC2980754
DOI: 10.1007/s12021-007-0012-5

Review

Sharing and reusing gene expression profiling data in neuroscience

Xiang Wan et al. Neuroinformatics. 2007 Fall.

. 2007 Fall;5(3):161-75.

doi: 10.1007/s12021-007-0012-5.

Authors

Xiang Wan¹, Paul Pavlidis

Affiliation

¹ Department of Psychiatry, UBC Bioinformatics Centre, University of British Columbia, 177 Michael Smith Laboratories 2185 East Mall, Vancouver, BC V6T1Z4, Canada.

PMID: 17917127
PMCID: PMC2980754
DOI: 10.1007/s12021-007-0012-5

Abstract

As public availability of gene expression profiling data increases, it is natural to ask how these data can be used by neuroscientists. Here we review the public availability of high-throughput expression data in neuroscience and how it has been reused, and tools that have been developed to facilitate reuse. There is increasing interest in making expression data reuse a routine part of the neuroscience tool-kit, but there are a number of challenges. Data must become more readily available in public databases; efforts to encourage investigators to make data available are important, as is education on the benefits of public data release. Once released, data must be better-annotated. Techniques and tools for data reuse are also in need of improvement. Integration of expression profiling data with neuroscience-specific resources such as anatomical atlases will further increase the value of expression data.

PubMed Disclaimer

Figures

**Figure 1**
Conceptualization of data selection for re-use. Criteria that are too stringent or too lax make comparisons difficult.

**Figure 2**
Trends in publications on expression profiling. We searched PubMed for entries using the search criteria “Gene Expression Profiling $M $Y[publication date]”, where $M was either “cancer” or “brain” and $Y was a year (1998–2006); or the total number of PubMed entries by year. A. Raw numbers showing that profiling papers accessible with the keyword “cancer” were consistently much more numerous than for “brain”. B. Data normalized by the number of publications in 2006, showing the similarity of the growth curves. Data for all PubMed entries are shown for comparison: submissions about profiling outpace the growth of PubMed by a wide margin.

**Figure 3**
Trends in submissions to GEO. We used the GEO web interface to identify experiment series submissions in each year, using the same keywords that were used for the PubMed analysis in Figure 3 (e.g., “GSE[Entry Type] AND 2002[Publication Date] AND cancer”). Values are expressed as the fraction of all GEO series submissions. The growth curve of GEO overall is shown in arbitrary units for comparison. Submissions with the keyword “brain” follow a similar trend to “cancer” but with consistently smaller numbers of submissions.

**Figure 4**
Re-analysis of mouse brain data from Sandberg et al. (2000). An example of how re-analysis of existing data can uncover previously unrecognized patterns. Pavlidis et al. (2001) identified genes showing brain-regionalization of expression using analysis of variance, in this case in the midbrain compared to five other regions, in two mouse strains. A comparison to the existing analysis showed that only a subset of these genes had been identified (marked by bullets). The heatmap shows relative expression levels, where white represents higher levels. Reproduced from Pavlidis et al. (2001) with permission.

**Figure 5**
Recurring expression patterns yield higher-quality functional inferences. Each curve is a cumulative distribution of gene similarity as reflected in shared GO terms. Curves further to the right represent coexpression patterns that reoccur in more data sets, and exhibit higher functional similarity. Redrawn using data from (Lee et al., 2004).

**Figure 6**
Screen shot of coexpression results from Gemma. Users enter a query gene of interest (PARK7 in this case) and are provided with a list of genes that are reproducibly coexpressed in multiple studies. The “support” is the number of data sets in which the pattern is found. “GO overlap” reflects the existing state of knowledge about the relatedness of the query gene to the result gene. The “Exps” column illustrates which data sets, of those searched, provide the support. A black line is shown for each data set where the pattern is found, giving a visual cue to which data sets are contributing most to the overall result. In this example, 57 data sets were searched, yielding 1788 patterns involving PARK7. Of these, nineteen positive and two negative correlation patterns were reproduced in at least 3 of the data sets.

See this image and copyright information in PMC

Comment in

Software development vis à vis collaboration in interdisciplinary research.
Herskovits EH. Herskovits EH. Neuroinformatics. 2007 Fall;5(3):176-7. doi: 10.1007/s12021-007-0008-1. Neuroinformatics. 2007. PMID: 17917128 No abstract available.

Cited by

Data reuse and the open data citation advantage.
Piwowar HA, Vision TJ. Piwowar HA, et al. PeerJ. 2013 Oct 1;1:e175. doi: 10.7717/peerj.175. eCollection 2013. PeerJ. 2013. PMID: 24109559 Free PMC article.
Data publishing and scientific journals: the future of the scientific paper in a world of shared data.
De Schutter E. De Schutter E. Neuroinformatics. 2010 Oct;8(3):151-3. doi: 10.1007/s12021-010-9084-8. Neuroinformatics. 2010. PMID: 20835853 Review. No abstract available.
Quantitative investigations of axonal and dendritic arbors: development, structure, function, and pathology.
Parekh R, Ascoli GA. Parekh R, et al. Neuroscientist. 2015 Jun;21(3):241-54. doi: 10.1177/1073858414540216. Epub 2014 Jun 27. Neuroscientist. 2015. PMID: 24972604 Free PMC article. Review.
The reuse of public datasets in the life sciences: potential risks and rewards.
Sielemann K, Hafner A, Pucker B. Sielemann K, et al. PeerJ. 2020 Sep 22;8:e9954. doi: 10.7717/peerj.9954. eCollection 2020. PeerJ. 2020. PMID: 33024631 Free PMC article.
Recommendations for the collection and use of multiplexed functional data for clinical variant interpretation.
Gelman H, Dines JN, Berg J, Berger AH, Brnich S, Hisama FM, James RG, Rubin AF, Shendure J, Shirts B, Fowler DM, Starita LM; Brotman Baty Institute Mutational Scanning Working Group. Gelman H, et al. Genome Med. 2019 Dec 20;11(1):85. doi: 10.1186/s13073-019-0698-7. Genome Med. 2019. PMID: 31862013 Free PMC article.

References

1. Aarnio V, Paananen J, Wong G. Analysis of microarray studies performed in the neurosciences. J Mol Neurosci. 2005;27:261–268. - PubMed
1. Assou S, Le Carrour T, Tondeur S, Strom S, Gabelle A, Marty S, Nadal L, Pantesco V, Reme T, Hugnot JP, et al. A meta-analysis of human embryonic stem cells transcriptome integrated into a web-based expression atlas. Stem cells (Dayton, Ohio) 2007;25:961–973. - PMC - PubMed
1. Barnes M, Freudenberg J, Thompson S, Aronow B, Pavlidis P. Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms. Nucleic acids research. 2005;33:5914–5923. - PMC - PubMed
1. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Edgar R. NCBI GEO: mining tens of millions of expression profiles--database and tools update. Nucleic acids research. 2007;35:D760–D765. - PMC - PubMed
1. Berman H, Henrick K, Nakamura H, Markley JL. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucl. Acids Res. 2007;35:D301–D303. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Sharing and reusing gene expression profiling data in neuroscience

Affiliation

Sharing and reusing gene expression profiling data in neuroscience

Authors

Affiliation

Abstract

Figures

Comment in

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Research Materials