. 2009 Jun 15;25(12):1521-7.

doi: 10.1093/bioinformatics/btp235. Epub 2009 Apr 7.

Enrichment constrained time-dependent clustering analysis for finding meaningful temporal transcription modules

Jia Meng¹, Shou-Jiang Gao, Yufei Huang

Affiliations

PMID: 19351618
PMCID: PMC2687989
DOI: 10.1093/bioinformatics/btp235

Enrichment constrained time-dependent clustering analysis for finding meaningful temporal transcription modules

Jia Meng et al. Bioinformatics. 2009.

. 2009 Jun 15;25(12):1521-7.

doi: 10.1093/bioinformatics/btp235. Epub 2009 Apr 7.

Authors

Jia Meng¹, Shou-Jiang Gao, Yufei Huang

Affiliation

¹ Department of ECE, University of Texas at San Antonio, Texas, USA.

PMID: 19351618
PMCID: PMC2687989
DOI: 10.1093/bioinformatics/btp235

Abstract

Motivation: Clustering is a popular data exploration technique widely used in microarray data analysis. When dealing with time-series data, most conventional clustering algorithms, however, either use one-way clustering methods, which fail to consider the heterogeneity of temporary domain, or use two-way clustering methods that do not take into account the time dependency between samples, thus producing less informative results. Furthermore, enrichment analysis is often performed independent of and after clustering and such practice, though capable of revealing biological significant clusters, cannot guide the clustering to produce biologically significant result.

Result: We present a new enrichment constrained framework (ECF) coupled with a time-dependent iterative signature algorithm (TDISA), which, by applying a sliding time window to incorporate the time dependency of samples and imposing an enrichment constraint to parameters of clustering, allows supervised identification of temporal transcription modules (TTMs) that are biologically meaningful. Rigorous mathematical definitions of TTM as well as the enrichment constraint framework are also provided that serve as objective functions for retrieving biologically significant modules. We applied the enrichment constrained time-dependent iterative signature algorithm (ECTDISA) to human gene expression time-series data of Kaposi's sarcoma-associated herpesvirus (KSHV) infection of human primary endothelial cells; the result not only confirms known biological facts, but also reveals new insight into the molecular mechanism of KSHV infection.

Availability: Data and Matlab code are available at http://engineering.utsa.edu/ approximately yfhuang/ECTDISA.html.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
Plot of enrichment score versus parameters and illustrations of 2D grids search of the optimal parameters. The horizontal axes stand for (τ_T, τ_G), and the vertical axis stands for the enrichment score S(M(τ_T, τ_G)). For the 2D grid search, starting from the same initial input gene set, different modules will be identified for every different parameters (τ_T, τ_G) on the grid point and the optimal module M* would be the one that has the largest enrichment score.

**Fig. 2.**
Data Generation. (a) Five TTMs are randomly generated. (b) Expression data is generated accordingly. (c) Reshuffle to get simulated data.

**Fig. 3.**
Simulated annotation database. (a) Ideal annotation database: annotated gene category contains exactly the same genes as embedded module. This is the case when we have ideal annotation databases. (b) Incomplete database: only parts of genes in an embedded module are annotated. This is the case when not all genes are annotated. (c) Incomplete and noised database: annotated gene category and the corresponding embedded module have overlap. It implies that not all genes are annotated; some annotations might be inaccurate or unsuitable to the specific data. This is a common case.

**Fig. 4.**
Performance VS Noise standard deviation. Performance of TDISA and K-means in terms of A score P_A and C score P_C for different noise level. ECTDISA and ECISA perform much better than ISA and K-means.

**Fig. 5.**
Impact of prior biological knowledge. When noise SD is 0.3, the two plots show the performances of the four algorithms when annotation file is not complete, i.e. not all genes are annotated. The horizontal axis represents the percentage of annotated genes. It shows, although more complete annotation can help to identify modules more accurately, the algorithm ECTDISA can still perform very well without complete annotations.

**Fig. 6.**
Impact of the consistence between prior knowledge and module. The consistent rate in horizontal axis is defined as the percentage of common genes that shared by the embedded module and the corresponding annotation gene category. This figure shows that the performance of ECTDISA when annotated gene category is not consistent with the embedded modules, i.e. not all genes involved in an annotated pathway behaves similarly. Simulation result shows ECTDISA is very robust towards the annotation files.

**Fig. 7.**
Selected modules (M2, 5, 7, 11, 16, 21, 22, 35, 37, 38, 45, 48).

**Fig. 8.**
Correlation of modules. The figures shows the 2-way K-means clustering results of the correlations.

See this image and copyright information in PMC

Cited by

Regulatory Snapshots: integrative mining of regulatory modules from expression time series and regulatory networks.
Gonçalves JP, Aires RS, Francisco AP, Madeira SC. Gonçalves JP, et al. PLoS One. 2012;7(5):e35977. doi: 10.1371/journal.pone.0035977. Epub 2012 May 1. PLoS One. 2012. PMID: 22563474 Free PMC article.
Robust inference of the context specific structure and temporal dynamics of gene regulatory network.
Meng J, Lu M, Chen Y, Gao SJ, Huang Y. Meng J, et al. BMC Genomics. 2010 Dec 1;11 Suppl 3(Suppl 3):S11. doi: 10.1186/1471-2164-11-S3-S11. BMC Genomics. 2010. PMID: 21143778 Free PMC article.
A hierarchical Bayesian model for flexible module discovery in three-way time-series data.
Amar D, Yekutieli D, Maron-Katz A, Hendler T, Shamir R. Amar D, et al. Bioinformatics. 2015 Jun 15;31(12):i17-26. doi: 10.1093/bioinformatics/btv228. Bioinformatics. 2015. PMID: 26072479 Free PMC article.
Function-based discovery of significant transcriptional temporal patterns in insulin stimulated muscle cells.
Di Camillo B, Irving BA, Schimke J, Sanavia T, Toffolo G, Cobelli C, Nair KS. Di Camillo B, et al. PLoS One. 2012;7(3):e32391. doi: 10.1371/journal.pone.0032391. Epub 2012 Mar 1. PLoS One. 2012. PMID: 22396763 Free PMC article.
REW-ISA V2: A Biclustering Method Fusing Homologous Information for Analyzing and Mining Epi-Transcriptome Data.
Zhang L, Chen S, Ma J, Liu Z, Liu H. Zhang L, et al. Front Genet. 2021 May 28;12:654820. doi: 10.3389/fgene.2021.654820. eCollection 2021. Front Genet. 2021. PMID: 34122508 Free PMC article.

See all "Cited by" articles

References

1. Alon U, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl Acad. Sci. USA. 1999;96:6745–6750. - PMC - PubMed
1. Ashburner M, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000;25:25–29. - PMC - PubMed
1. Bergmann S, et al. Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev. E Stat. Nonlin. Soft. Matter Phys. 2003;67:031902. - PubMed
1. Bittner M, et al. Data analysis and integration: of steps and arrows. Nat. Genet. 1999;22:213–215. - PubMed
1. Califano A, et al. Analysis of gene expression microarrays for phenotype classification. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2000;8:75–85. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Enrichment constrained time-dependent clustering analysis for finding meaningful temporal transcription modules

Affiliation

Enrichment constrained time-dependent clustering analysis for finding meaningful temporal transcription modules

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases