Exploring the functional landscape of gene expression: directed search of large microarray compendia
- PMID: 17724061
- DOI: 10.1093/bioinformatics/btm403
Exploring the functional landscape of gene expression: directed search of large microarray compendia
Abstract
Motivation: The increasing availability of gene expression microarray technology has resulted in the publication of thousands of microarray gene expression datasets investigating various biological conditions. This vast repository is still underutilized due to the lack of methods for fast, accurate exploration of the entire compendium.
Results: We have collected Saccharomyces cerevisiae gene expression microarray data containing roughly 2400 experimental conditions. We analyzed the functional coverage of this collection and we designed a context-sensitive search algorithm for rapid exploration of the compendium. A researcher using our system provides a small set of query genes to establish a biological search context; based on this query, we weight each dataset's relevance to the context, and within these weighted datasets we identify additional genes that are co-expressed with the query set. Our method exhibits an average increase in accuracy of 273% compared to previous mega-clustering approaches when recapitulating known biology. Further, we find that our search paradigm identifies novel biological predictions that can be verified through further experimentation. Our methodology provides the ability for biological researchers to explore the totality of existing microarray data in a manner useful for drawing conclusions and formulating hypotheses, which we believe is invaluable for the research community.
Availability: Our query-driven search engine, called SPELL, is available at http://function.princeton.edu/SPELL.
Supplementary information: Several additional data files, figures and discussions are available at http://function.princeton.edu/SPELL/supplement.
Similar articles
-
Integration of GO annotations in Correspondence Analysis: facilitating the interpretation of microarray data.Bioinformatics. 2005 May 15;21(10):2424-9. doi: 10.1093/bioinformatics/bti367. Epub 2005 Mar 3. Bioinformatics. 2005. PMID: 15746280
-
Combining gene expression profiles and protein-protein interaction data to infer gene functions.J Biotechnol. 2006 Jul 25;124(3):475-85. doi: 10.1016/j.jbiotec.2006.01.024. Epub 2006 Mar 13. J Biotechnol. 2006. PMID: 16530869
-
CellMontage: similar expression profile search server.Bioinformatics. 2007 Nov 15;23(22):3103-4. doi: 10.1093/bioinformatics/btm462. Epub 2007 Sep 25. Bioinformatics. 2007. PMID: 17895274
-
Gene expression omnibus: microarray data storage, submission, retrieval, and analysis.Methods Enzymol. 2006;411:352-69. doi: 10.1016/S0076-6879(06)11019-8. Methods Enzymol. 2006. PMID: 16939800 Free PMC article. Review.
-
Classification based upon gene expression data: bias and precision of error rates.Bioinformatics. 2007 Jun 1;23(11):1363-70. doi: 10.1093/bioinformatics/btm117. Epub 2007 Mar 28. Bioinformatics. 2007. PMID: 17392326 Review.
Cited by
-
"Guilt by association" is the exception rather than the rule in gene networks.PLoS Comput Biol. 2012;8(3):e1002444. doi: 10.1371/journal.pcbi.1002444. Epub 2012 Mar 29. PLoS Comput Biol. 2012. PMID: 22479173 Free PMC article.
-
Computational discovery of transcriptional regulatory modules in fungal ribosome biogenesis genes reveals novel sequence and function patterns.PLoS One. 2013;8(3):e59851. doi: 10.1371/journal.pone.0059851. Epub 2013 Mar 29. PLoS One. 2013. PMID: 23555806 Free PMC article.
-
YeastMine--an integrated data warehouse for Saccharomyces cerevisiae data as a multipurpose tool-kit.Database (Oxford). 2012 Mar 20;2012:bar062. doi: 10.1093/database/bar062. Print 2012. Database (Oxford). 2012. PMID: 22434830 Free PMC article.
-
Deciphering eukaryotic gene-regulatory logic with 100 million random promoters.Nat Biotechnol. 2020 Jan;38(1):56-65. doi: 10.1038/s41587-019-0315-8. Epub 2019 Dec 2. Nat Biotechnol. 2020. PMID: 31792407 Free PMC article.
-
Integrating in silico resources to map a signaling network.Methods Mol Biol. 2014;1101:197-245. doi: 10.1007/978-1-62703-721-1_11. Methods Mol Biol. 2014. PMID: 24233784 Free PMC article.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases