Predicting gene expression from sequence
- PMID: 15084257
- DOI: 10.1016/s0092-8674(04)00304-6
Predicting gene expression from sequence
Abstract
We describe a systematic genome-wide approach for learning the complex combinatorial code underlying gene expression. Our probabilistic approach identifies local DNA-sequence elements and the positional and combinatorial constraints that determine their context-dependent role in transcriptional regulation. The inferred regulatory rules correctly predict expression patterns for 73% of genes in Saccharomyces cerevisiae, utilizing microarray expression data and sequences in the 800 bp upstream of genes. Application to Caenorhabditis elegans identifies predictive regulatory elements and combinatorial rules that control the phased temporal expression of transcription factors, histones, and germline specific genes. Successful prediction requires diverse and complex rules utilizing AND, OR, and NOT logic, with significant constraints on motif strength, orientation, and relative position. This system generates a large number of mechanistic hypotheses for focused experimental validation, and establishes a predictive dynamical framework for understanding cellular behavior from genomic sequence.
Similar articles
-
Bayesian variable selection for gene expression modeling with regulatory motif binding sites in neuroinflammatory events.Neuroinformatics. 2006 Winter;4(1):95-117. doi: 10.1385/NI:4:1:95. Neuroinformatics. 2006. PMID: 16595861
-
Computational discovery of transcriptional regulatory rules.Bioinformatics. 2005 Sep 1;21 Suppl 2:ii101-7. doi: 10.1093/bioinformatics/bti1117. Bioinformatics. 2005. PMID: 16204087
-
Inferring genetic regulatory logic from expression data.Bioinformatics. 2005 Jun 1;21(11):2706-13. doi: 10.1093/bioinformatics/bti388. Epub 2005 Mar 22. Bioinformatics. 2005. PMID: 15784747
-
Predicting genetic regulatory response using classification.Bioinformatics. 2004 Aug 4;20 Suppl 1:i232-40. doi: 10.1093/bioinformatics/bth923. Bioinformatics. 2004. PMID: 15262804
-
Expression profiling and comparative sequence derived insights into lipid metabolism.Curr Opin Lipidol. 2002 Apr;13(2):173-9. doi: 10.1097/00041433-200204000-00009. Curr Opin Lipidol. 2002. PMID: 11891420 Review.
Cited by
-
A machine learning approach for identifying novel cell type-specific transcriptional regulators of myogenesis.PLoS Genet. 2012;8(3):e1002531. doi: 10.1371/journal.pgen.1002531. Epub 2012 Mar 8. PLoS Genet. 2012. PMID: 22412381 Free PMC article.
-
Predicting tissue specific cis-regulatory modules in the human genome using pairs of co-occurring motifs.BMC Bioinformatics. 2012 Feb 7;13:25. doi: 10.1186/1471-2105-13-25. BMC Bioinformatics. 2012. PMID: 22313678 Free PMC article.
-
Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks.Genome Res. 2016 Jul;26(7):990-9. doi: 10.1101/gr.200535.115. Epub 2016 May 3. Genome Res. 2016. PMID: 27197224 Free PMC article.
-
Identification and analysis of the promoter region of the human DHCR24 gene: involvement of DNA methylation and histone acetylation.Mol Biol Rep. 2011 Feb;38(2):1091-101. doi: 10.1007/s11033-010-0206-z. Epub 2010 Jun 22. Mol Biol Rep. 2011. PMID: 20568014
-
Getting started in probabilistic graphical models.PLoS Comput Biol. 2007 Dec;3(12):e252. doi: 10.1371/journal.pcbi.0030252. PLoS Comput Biol. 2007. PMID: 18069887 Free PMC article. Review. No abstract available.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
Molecular Biology Databases