Predicting genetic regulatory response using classification
- PMID: 15262804
- DOI: 10.1093/bioinformatics/bth923
Predicting genetic regulatory response using classification
Abstract
Motivation: Studying gene regulatory mechanisms in simple model organisms through analysis of high-throughput genomic data has emerged as a central problem in computational biology. Most approaches in the literature have focused either on finding a few strong regulatory patterns or on learning descriptive models from training data. However, these approaches are not yet adequate for making accurate predictions about which genes will be up- or down-regulated in new or held-out experiments. By introducing a predictive methodology for this problem, we can use powerful tools from machine learning and assess the statistical significance of our predictions.
Results: We present a novel classification-based method for learning to predict gene regulatory response. Our approach is motivated by the hypothesis that in simple organisms such as Saccharomyces cerevisiae, we can learn a decision rule for predicting whether a gene is up- or down-regulated in a particular experiment based on (1) the presence of binding site subsequences ('motifs') in the gene's regulatory region and (2) the expression levels of regulators such as transcription factors in the experiment ('parents'). Thus, our learning task integrates two qualitatively different data sources: genome-wide cDNA microarray data across multiple perturbation and mutant experiments along with motif profile data from regulatory sequences. We convert the regression task of predicting real-valued gene expression measurements to a classification task of predicting +1 and -1 labels, corresponding to up- and down-regulation beyond the levels of biological and measurement noise in microarray measurements. The learning algorithm employed is boosting with a margin-based generalization of decision trees, alternating decision trees. This large-margin classifier is sufficiently flexible to allow complex logical functions, yet sufficiently simple to give insight into the combinatorial mechanisms of gene regulation. We observe encouraging prediction accuracy on experiments based on the Gasch S.cerevisiae dataset, and we show that we can accurately predict up- and down-regulation on held-out experiments. We also show how to extract significant regulators, motifs and motif-regulator pairs from the learned models for various stress responses. Our method thus provides predictive hypotheses, suggests biological experiments, and provides interpretable insight into the structure of genetic regulatory networks.
Availability: The MLJava package is available upon request to the authors. Supplementary: Additional results are available from http://www.cs.columbia.edu/compbio/geneclass
Similar articles
-
Regulatory motif finding by logic regression.Bioinformatics. 2004 Nov 1;20(16):2799-811. doi: 10.1093/bioinformatics/bth333. Epub 2004 May 27. Bioinformatics. 2004. PMID: 15166027
-
Learning regulatory programs that accurately predict differential expression with MEDUSA.Ann N Y Acad Sci. 2007 Dec;1115:178-202. doi: 10.1196/annals.1407.020. Epub 2007 Oct 12. Ann N Y Acad Sci. 2007. PMID: 17934055
-
Computational discovery of transcriptional regulatory rules.Bioinformatics. 2005 Sep 1;21 Suppl 2:ii101-7. doi: 10.1093/bioinformatics/bti1117. Bioinformatics. 2005. PMID: 16204087
-
Advances in analysis of transcriptional regulatory networks.Wiley Interdiscip Rev Syst Biol Med. 2011 Jan-Feb;3(1):21-35. doi: 10.1002/wsbm.105. Wiley Interdiscip Rev Syst Biol Med. 2011. PMID: 21069662 Review.
-
Regulatory elements and expression profiles.Curr Opin Struct Biol. 1999 Jun;9(3):400-7. doi: 10.1016/S0959-440X(99)80054-2. Curr Opin Struct Biol. 1999. PMID: 10361093 Review.
Cited by
-
Identifying metabolic enzymes with multiple types of association evidence.BMC Bioinformatics. 2006 Mar 29;7:177. doi: 10.1186/1471-2105-7-177. BMC Bioinformatics. 2006. PMID: 16571130 Free PMC article.
-
Reconstructing a network of stress-response regulators via dynamic system modeling of gene regulation.Gene Regul Syst Bio. 2008 Feb 10;2:53-62. doi: 10.4137/grsb.s558. Gene Regul Syst Bio. 2008. PMID: 19787074 Free PMC article.
-
What are decision trees?Nat Biotechnol. 2008 Sep;26(9):1011-3. doi: 10.1038/nbt0908-1011. Nat Biotechnol. 2008. PMID: 18779814 Free PMC article. Review.
-
Lasting impressions: motifs in protein-protein maps may provide footprints of evolutionary events.Proc Natl Acad Sci U S A. 2005 Mar 1;102(9):3173-4. doi: 10.1073/pnas.0500130102. Epub 2005 Feb 22. Proc Natl Acad Sci U S A. 2005. PMID: 15728355 Free PMC article. No abstract available.
-
Mapping yeast transcriptional networks.Genetics. 2013 Sep;195(1):9-36. doi: 10.1534/genetics.113.153262. Genetics. 2013. PMID: 24018767 Free PMC article. Review.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases