Discovering local structure in gene expression data: the order-preserving submatrix problem
- PMID: 12935334
- DOI: 10.1089/10665270360688075
Discovering local structure in gene expression data: the order-preserving submatrix problem
Abstract
This paper concerns the discovery of patterns in gene expression matrices, in which each element gives the expression level of a given gene in a given experiment. Most existing methods for pattern discovery in such matrices are based on clustering genes by comparing their expression levels in all experiments, or clustering experiments by comparing their expression levels for all genes. Our work goes beyond such global approaches by looking for local patterns that manifest themselves when we focus simultaneously on a subset G of the genes and a subset T of the experiments. Specifically, we look for order-preserving submatrices (OPSMs), in which the expression levels of all genes induce the same linear ordering of the experiments (we show that the OPSM search problem is NP-hard in the worst case). Such a pattern might arise, for example, if the experiments in T represent distinct stages in the progress of a disease or in a cellular process and the expression levels of all genes in G vary across the stages in the same way. We define a probabilistic model in which an OPSM is hidden within an otherwise random matrix. Guided by this model, we develop an efficient algorithm for finding the hidden OPSM in the random matrix. In data generated according to the model, the algorithm recovers the hidden OPSM with a very high success rate. Application of the methods to breast cancer data seem to reveal significant local patterns.
Similar articles
-
On mining micro-array data by Order-Preserving Submatrix.Int J Bioinform Res Appl. 2007;3(1):42-64. doi: 10.1504/IJBRA.2007.011834. Int J Bioinform Res Appl. 2007. PMID: 18048172
-
A New Approach for Mining Order-Preserving Submatrices Based on All Common Subsequences.Comput Math Methods Med. 2015;2015:680434. doi: 10.1155/2015/680434. Epub 2015 May 28. Comput Math Methods Med. 2015. PMID: 26161131 Free PMC article.
-
Biclustering in gene expression data by tendency.Proc IEEE Comput Syst Bioinform Conf. 2004:182-93. doi: 10.1109/csb.2004.1332431. Proc IEEE Comput Syst Bioinform Conf. 2004. PMID: 16448012
-
Clustering and re-clustering for pattern discovery in gene expression data.J Bioinform Comput Biol. 2005 Apr;3(2):281-301. doi: 10.1142/s0219720005001053. J Bioinform Comput Biol. 2005. PMID: 15852506
-
Microarrays--identifying molecular portraits for prostate tumors with different Gleason patterns.Methods Mol Med. 2008;141:131-51. doi: 10.1007/978-1-60327-148-6_8. Methods Mol Med. 2008. PMID: 18453088 Review.
Cited by
-
Identification of bicluster regions in a binary matrix and its applications.PLoS One. 2013 Aug 5;8(8):e71680. doi: 10.1371/journal.pone.0071680. Print 2013. PLoS One. 2013. PMID: 23940779 Free PMC article.
-
Pairwise gene GO-based measures for biclustering of high-dimensional expression data.BioData Min. 2018 Mar 27;11:4. doi: 10.1186/s13040-018-0165-9. eCollection 2018. BioData Min. 2018. PMID: 29610579 Free PMC article.
-
TriRNSC: triclustering of gene expression microarray data using restricted neighbourhood search.IET Syst Biol. 2020 Dec;14(6):323-333. doi: 10.1049/iet-syb.2020.0024. IET Syst Biol. 2020. PMID: 33399096 Free PMC article.
-
Cross-Activity Analysis of CRISPR/Cas9 Editing in Gene Families of Solanum lycopersicum Detected by Long-Read Sequencing.Curr Issues Mol Biol. 2025 Jul 2;47(7):507. doi: 10.3390/cimb47070507. Curr Issues Mol Biol. 2025. PMID: 40728976 Free PMC article.
-
Biclustering data analysis: a comprehensive survey.Brief Bioinform. 2024 May 23;25(4):bbae342. doi: 10.1093/bib/bbae342. Brief Bioinform. 2024. PMID: 39007596 Free PMC article. Review.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous