Identification of regulatory elements using a feature selection method
- PMID: 12217908
- DOI: 10.1093/bioinformatics/18.9.1167
Identification of regulatory elements using a feature selection method
Abstract
Motivation: Many methods have been described to identify regulatory motifs in the transcription control regions of genes that exhibit similar patterns of gene expression across a variety of experimental conditions. Here we focus on a single experimental condition, and utilize gene expression data to identify sequence motifs associated with genes that are activated under this experimental condition. We use a linear model with two-way interactions to model gene expression as a function of sequence features (words) present in presumptive transcription control regions. The most relevant features are selected by a feature selection method called stepwise selection with monte carlo cross validation. We apply this method to a publicly available dataset of the yeast Saccharomyces cerevisiae, focussing on the 800 basepairs immediately upstream of each gene's translation start site (the upstream control region (UCR)).
Results: We successfully identify regulatory motifs that are known to be active under the experimental conditions analyzed, and find additional significant sequences that may represent novel regulatory motifs. We also discuss a complementary method that utilizes gene expression data from a single microarray experiment and allows averaging over variety of experimental conditions as an alternative to motif finding methods that act on clusters of co-expressed genes.
Availability: The software is available upon request from the first author or may be downloaded from http://www.stat.berkeley.edu/~sunduz.
Contact: keles@stat.berkeley.edu
Similar articles
-
Regulatory motif finding by logic regression.Bioinformatics. 2004 Nov 1;20(16):2799-811. doi: 10.1093/bioinformatics/bth333. Epub 2004 May 27. Bioinformatics. 2004. PMID: 15166027
-
Identification of DNA regulatory motifs using Bayesian variable selection.Bioinformatics. 2004 Nov 1;20(16):2553-61. doi: 10.1093/bioinformatics/bth282. Epub 2004 Apr 29. Bioinformatics. 2004. PMID: 15117754
-
Efficiently finding regulatory elements using correlation with gene expression.J Bioinform Comput Biol. 2004 Jun;2(2):273-88. doi: 10.1142/s0219720004000612. J Bioinform Comput Biol. 2004. PMID: 15297982
-
CLICK and EXPANDER: a system for clustering and visualizing gene expression data.Bioinformatics. 2003 Sep 22;19(14):1787-99. doi: 10.1093/bioinformatics/btg232. Bioinformatics. 2003. PMID: 14512350
-
Regulatory sequence analysis: application to the interpretation of gene expression.Eur Neuropsychopharmacol. 2001 Dec;11(6):399-411. doi: 10.1016/s0924-977x(01)00117-1. Eur Neuropsychopharmacol. 2001. PMID: 11704417 Review.
Cited by
-
Bayesian variable selection for gene expression modeling with regulatory motif binding sites in neuroinflammatory events.Neuroinformatics. 2006 Winter;4(1):95-117. doi: 10.1385/NI:4:1:95. Neuroinformatics. 2006. PMID: 16595861
-
Practical strategies for discovering regulatory DNA sequence motifs.PLoS Comput Biol. 2006 Apr;2(4):e36. doi: 10.1371/journal.pcbi.0020036. PLoS Comput Biol. 2006. PMID: 16683017 Free PMC article. No abstract available.
-
Genome-wide investigation of light and carbon signaling interactions in Arabidopsis.Genome Biol. 2004;5(2):R10. doi: 10.1186/gb-2004-5-2-r10. Epub 2004 Jan 27. Genome Biol. 2004. PMID: 14759260 Free PMC article.
-
Prediction of synergistic transcription factors by function conservation.Genome Biol. 2007;8(12):R257. doi: 10.1186/gb-2007-8-12-r257. Genome Biol. 2007. PMID: 18053230 Free PMC article.
-
Multi-Attribute Subset Selection enables prediction of representative phenotypes across microbial populations.Commun Biol. 2024 Apr 3;7(1):407. doi: 10.1038/s42003-024-06093-w. Commun Biol. 2024. PMID: 38570615 Free PMC article.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Molecular Biology Databases