Genes, themes and microarrays: using information retrieval for large-scale gene analysis
- PMID: 10977093
Genes, themes and microarrays: using information retrieval for large-scale gene analysis
Abstract
The immense volume of data resulting from DNA microarray experiments, accompanied by an increase in the number of publications discussing gene-related discoveries, presents a major data analysis challenge. Current methods for genome-wide analysis of expression data typically rely on cluster analysis of gene expression patterns. Clustering indeed reveals potentially meaningful relationships among genes, but can not explain the underlying biological mechanisms. In an attempt to address this problem, we have developed a new approach for utilizing the literature in order to establish functional relationships among genes on a genome-wide scale. Our method is based on revealing coherent themes within the literature, using a similarity-based search in document space. Content-based relationships among abstracts are then translated into functional connections among genes. We describe preliminary experiments applying our algorithm to a database of documents discussing yeast genes. A comparison of the produced results with well-established yeast gene functions demonstrates the effectiveness of our approach.
Similar articles
-
Attribute clustering for grouping, selection, and classification of gene expression data.IEEE/ACM Trans Comput Biol Bioinform. 2005 Apr-Jun;2(2):83-101. doi: 10.1109/TCBB.2005.17. IEEE/ACM Trans Comput Biol Bioinform. 2005. PMID: 17044174
-
Selection of long oligonucleotides for gene expression microarrays using weighted rank-sum strategy.BMC Bioinformatics. 2007 Sep 19;8:350. doi: 10.1186/1471-2105-8-350. BMC Bioinformatics. 2007. PMID: 17880708 Free PMC article.
-
Clustering microarray-derived gene lists through implicit literature relationships.Bioinformatics. 2007 Aug 1;23(15):1995-2003. doi: 10.1093/bioinformatics/btm261. Epub 2007 May 30. Bioinformatics. 2007. PMID: 17537751
-
A new clustering method for microarray data analysis.Proc IEEE Comput Soc Bioinform Conf. 2002;1:268-75. Proc IEEE Comput Soc Bioinform Conf. 2002. PMID: 15838143
-
Large-scale gene expression data analysis: a new challenge to computational biologists.Genome Res. 1999 Aug;9(8):681-8. Genome Res. 1999. PMID: 10447504 Review.
Cited by
-
The computational analysis of scientific literature to define and recognize gene expression clusters.Nucleic Acids Res. 2003 Aug 1;31(15):4553-60. doi: 10.1093/nar/gkg636. Nucleic Acids Res. 2003. PMID: 12888516 Free PMC article.
-
PESCADOR, a web-based tool to assist text-mining of biointeractions extracted from PubMed queries.BMC Bioinformatics. 2011 Nov 9;12:435. doi: 10.1186/1471-2105-12-435. BMC Bioinformatics. 2011. PMID: 22070195 Free PMC article.
-
Combining classifiers to predict gene function in Arabidopsis thaliana using large-scale gene expression measurements.BMC Bioinformatics. 2007 Sep 21;8:358. doi: 10.1186/1471-2105-8-358. BMC Bioinformatics. 2007. PMID: 17888165 Free PMC article.
-
Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature.Genome Res. 2002 Jan;12(1):203-14. doi: 10.1101/gr.199701. Genome Res. 2002. PMID: 11779846 Free PMC article.
-
The Text-mining based PubChem Bioassay neighboring analysis.BMC Bioinformatics. 2010 Nov 8;11:549. doi: 10.1186/1471-2105-11-549. BMC Bioinformatics. 2010. PMID: 21059237 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Other Literature Sources
Molecular Biology Databases