Iterative signature algorithm for the analysis of large-scale gene expression data
- PMID: 12689096
- DOI: 10.1103/PhysRevE.67.031902
Iterative signature algorithm for the analysis of large-scale gene expression data
Abstract
We present an approach for the analysis of genome-wide expression data. Our method is designed to overcome the limitations of traditional techniques, when applied to large-scale data. Rather than alloting each gene to a single cluster, we assign both genes and conditions to context-dependent and potentially overlapping transcription modules. We provide a rigorous definition of a transcription module as the object to be retrieved from the expression data. An efficient algorithm, which searches for the modules encoded in the data by iteratively refining sets of genes and conditions until they match this definition, is established. Each iteration involves a linear map, induced by the normalized expression matrix, followed by the application of a threshold function. We argue that our method is in fact a generalization of singular value decomposition, which corresponds to the special case where no threshold is applied. We show analytically that for noisy expression data our approach leads to better classification due to the implementation of the threshold. This result is confirmed by numerical analyses based on in silico expression data. We discuss briefly results obtained by applying our algorithm to expression data from the yeast Saccharomyces cerevisiae.
Similar articles
-
Defining transcription modules using large-scale gene expression data.Bioinformatics. 2004 Sep 1;20(13):1993-2003. doi: 10.1093/bioinformatics/bth166. Epub 2004 Mar 25. Bioinformatics. 2004. PMID: 15044247
-
Revealing modular organization in the yeast transcriptional network.Nat Genet. 2002 Aug;31(4):370-7. doi: 10.1038/ng941. Epub 2002 Jul 22. Nat Genet. 2002. PMID: 12134151
-
Dynamic models of gene expression and classification.Funct Integr Genomics. 2001 Mar;1(4):269-78. doi: 10.1007/s101420000035. Funct Integr Genomics. 2001. PMID: 11793246
-
Computational discovery of gene modules and regulatory networks.Nat Biotechnol. 2003 Nov;21(11):1337-42. doi: 10.1038/nbt890. Epub 2003 Oct 12. Nat Biotechnol. 2003. PMID: 14555958
-
Defining transcriptional networks through integrative modeling of mRNA expression and transcription factor binding data.BMC Bioinformatics. 2004 Mar 18;5:31. doi: 10.1186/1471-2105-5-31. BMC Bioinformatics. 2004. PMID: 15113405 Free PMC article.
Cited by
-
Bayesian generalized biclustering analysis via adaptive structured shrinkage.Biostatistics. 2020 Jul 1;21(3):610-624. doi: 10.1093/biostatistics/kxy081. Biostatistics. 2020. PMID: 30596887 Free PMC article.
-
Multi-species integrative biclustering.Genome Biol. 2010;11(9):R96. doi: 10.1186/gb-2010-11-9-r96. Epub 2010 Sep 29. Genome Biol. 2010. PMID: 20920250 Free PMC article.
-
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae.BMC Bioinformatics. 2006 Mar 21;7:165. doi: 10.1186/1471-2105-7-165. BMC Bioinformatics. 2006. PMID: 16551355 Free PMC article.
-
Automatic layout and visualization of biclusters.Algorithms Mol Biol. 2006 Sep 4;1:15. doi: 10.1186/1748-7188-1-15. Algorithms Mol Biol. 2006. PMID: 16952321 Free PMC article.
-
Enrichment constrained time-dependent clustering analysis for finding meaningful temporal transcription modules.Bioinformatics. 2009 Jun 15;25(12):1521-7. doi: 10.1093/bioinformatics/btp235. Epub 2009 Apr 7. Bioinformatics. 2009. PMID: 19351618 Free PMC article.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Other Literature Sources
Medical
Molecular Biology Databases