Mining gene expression databases for association rules
- PMID: 12499296
- DOI: 10.1093/bioinformatics/19.1.79
Mining gene expression databases for association rules
Abstract
Motivation: Global gene expression profiling, both at the transcript level and at the protein level, can be a valuable tool in the understanding of genes, biological networks, and cellular states. As larger and larger gene expression data sets become available, data mining techniques can be applied to identify patterns of interest in the data. Association rules, used widely in the area of market basket analysis, can be applied to the analysis of expression data as well. Association rules can reveal biologically relevant associations between different genes or between environmental effects and gene expression. An association rule has the form LHS --> RHS, where LHS and RHS are disjoint sets of items, the RHS set being likely to occur whenever the LHS set occurs. Items in gene expression data can include genes that are highly expressed or repressed, as well as relevant facts describing the cellular environment of the genes (e.g. the diagnosis of a tumor sample from which a profile was obtained).
Results: We demonstrate an algorithm for efficiently mining association rules from gene expression data, using the data set from Hughes et al. (2000, Cell, 102, 109-126) of 300 expression profiles for yeast. Using the algorithm, we find numerous rules in the data. A cursory analysis of some of these rules reveals numerous associations between certain genes, many of which make sense biologically, others suggesting new hypotheses that may warrant further investigation. In a data set derived from the yeast data set, but with the expression values for each transcript randomly shifted with respect to the experiments, no rules were found, indicating that most all of the rules mined from the actual data set are not likely to have occurred by chance.
Availability: An implementation of the algorithm using Microsoft SQL Server with Access 2000 is available at http://dot.ped.med.umich.edu:2000/pub/assoc_rules/assoc_rules.zip. Our results from mining the yeast data set are available at http://dot.ped.med.umich.edu:2000/pub/assoc_rules/yeast_results.zip.
Similar articles
-
Mining gene expression data for positive and negative co-regulated gene clusters.Bioinformatics. 2004 Nov 1;20(16):2711-8. doi: 10.1093/bioinformatics/bth312. Epub 2004 May 14. Bioinformatics. 2004. PMID: 15145808
-
Database of repetitive elements in complete genomes and data mining using transcription factor binding sites.IEEE Trans Inf Technol Biomed. 2003 Jun;7(2):93-100. doi: 10.1109/titb.2003.811878. IEEE Trans Inf Technol Biomed. 2003. PMID: 12834164
-
Dynamic association rules for gene expression data analysis.BMC Genomics. 2015 Oct 14;16:786. doi: 10.1186/s12864-015-1970-x. BMC Genomics. 2015. PMID: 26467206 Free PMC article.
-
A primer to frequent itemset mining for bioinformatics.Brief Bioinform. 2015 Mar;16(2):216-31. doi: 10.1093/bib/bbt074. Epub 2013 Oct 26. Brief Bioinform. 2015. PMID: 24162173 Free PMC article. Review.
-
From microarrays to networks: mining expression time series.Drug Discov Today. 2002 Oct 15;7(20 Suppl):S170-5. doi: 10.1016/s1359-6446(02)02440-6. Drug Discov Today. 2002. PMID: 12546901 Review.
Cited by
-
An Analysis of the Clinical Medication Rules of Traditional Chinese Medicine for Polycystic Ovary Syndrome Based on Data Mining.Evid Based Complement Alternat Med. 2023 Feb 21;2023:6198001. doi: 10.1155/2023/6198001. eCollection 2023. Evid Based Complement Alternat Med. 2023. PMID: 36865746 Free PMC article.
-
Mining rare associations between biological ontologies.PLoS One. 2014 Jan 3;9(1):e84475. doi: 10.1371/journal.pone.0084475. eCollection 2014. PLoS One. 2014. PMID: 24404165 Free PMC article.
-
Use of Radcube for extraction of finding trends in a large radiology practice.J Digit Imaging. 2009 Dec;22(6):629-40. doi: 10.1007/s10278-008-9128-x. Epub 2008 Jun 10. J Digit Imaging. 2009. PMID: 18543033 Free PMC article.
-
Expression Data Analysis for the Identification of Potential Biomarker of Pregnancy Associated Breast Cancer.Pathol Oncol Res. 2017 Jul;23(3):537-544. doi: 10.1007/s12253-016-0133-y. Epub 2016 Nov 10. Pathol Oncol Res. 2017. PMID: 27832451
-
Mining differential top-k co-expression patterns from time course comparative gene expression datasets.BMC Bioinformatics. 2013 Jul 21;14:230. doi: 10.1186/1471-2105-14-230. BMC Bioinformatics. 2013. PMID: 23870110 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Molecular Biology Databases
Research Materials