. 2010 Jan 30:11:68.

doi: 10.1186/1471-2105-11-68.

A temporal precedence based clustering method for gene expression microarray data

Ritesh Krishna¹, Chang-Tsun Li, Vicky Buchanan-Wollaston

Affiliations

PMID: 20113513
PMCID: PMC2841598
DOI: 10.1186/1471-2105-11-68

A temporal precedence based clustering method for gene expression microarray data

Ritesh Krishna et al. BMC Bioinformatics. 2010.

. 2010 Jan 30:11:68.

doi: 10.1186/1471-2105-11-68.

Authors

Ritesh Krishna¹, Chang-Tsun Li, Vicky Buchanan-Wollaston

Affiliation

¹ Department of Computer Science, Warwick University, Coventry CV4 7AL, UK.

PMID: 20113513
PMCID: PMC2841598
DOI: 10.1186/1471-2105-11-68

Abstract

Background: Time-course microarray experiments can produce useful data which can help in understanding the underlying dynamics of the system. Clustering is an important stage in microarray data analysis where the data is grouped together according to certain characteristics. The majority of clustering techniques are based on distance or visual similarity measures which may not be suitable for clustering of temporal microarray data where the sequential nature of time is important. We present a Granger causality based technique to cluster temporal microarray gene expression data, which measures the interdependence between two time-series by statistically testing if one time-series can be used for forecasting the other time-series or not.

Results: A gene-association matrix is constructed by testing temporal relationships between pairs of genes using the Granger causality test. The association matrix is further analyzed using a graph-theoretic technique to detect highly connected components representing interesting biological modules. We test our approach on synthesized datasets and real biological datasets obtained for Arabidopsis thaliana. We show the effectiveness of our approach by analyzing the results using the existing biological literature. We also report interesting structural properties of the association network commonly desired in any biological system.

Conclusions: Our experiments on synthesized and real microarray datasets show that our approach produces encouraging results. The method is simple in implementation and is statistically traceable at each step. The method can produce sets of functionally related genes which can be further used for reverse-engineering of gene circuits.

PubMed Disclaimer

Figures

**Figure 1**
**Inferred network for Dataset 1**. The network structure inferred after applying Granger causality test on the synthetic dataset 1.

**Figure 2**
**Inferred network for Dataset 2**. The network structure inferred after applying Granger causality test on the synthetic dataset 2.

**Figure 3**
**Inferred network for Dataset 3**. The network structure inferred after applying Granger causality test on the synthetic dataset 3.

**Figure 4**
**Simulation results with Dataset 1, 2 and 3 integrated into one system**. The association graph obtained after applying the Granger causality test on the combined dataset is represented in form of a association matrix. We can see three distinct island like modules in the graph, each module representing a dataset.

**Figure 5**
**Temporal profiles of genes selected for smaller dataset for Arabidopsis**. The temporal profiles of the genes selected to constitute the smaller Arabidopsis dataset is shown. A) Genes annotated for circadian activity B) Genes annotated for death and C) Gene annotated for Ageing.

**Figure 6**
**Degree sorted network structure**. The association graph obtained after applying Granger causality test is displayed in a degree sorted manner.

**Figure 7**
**Extracted subgraphs indicating potential modules of interest in the smaller dataset**. The biological functions performed by modules in respective figures are A.) Circadian rhythm B.) Immune and Defense response C.) Circadian rhythm and D.) Aging. The GO annotations for the genes can be seen in Table 1.

**Figure 8**
**Extracted subgraph indicating potential module of interest in the bigger dataset - Set 1**. The genes belonging to *Response to stress* category are highlighted in yellow. The GO annotations of the highlighted genes are presented in Table 2.

**Figure 9**
**Extracted subgraph indicating potential module of interest in the bigger dataset - Set 2**. The genes belonging to *Cytoplasmic part* category are highlighted in yellow. The GO annotations of the highlighted genes are presented in Table 2.

**Figure 10**
**Extracted subgraph indicating potential module of interest in the bigger dataset - Set 3**. The genes belonging to *Response to stimulus* category are highlighted in yellow. The GO annotations of the highlighted genes are presented in Table 2.

**Figure 11**
**Extracted subgraph indicating potential module of interest in the bigger dataset - Set 4**. The genes belonging to *Response to abiotic stimulus* category are highlighted in yellow. The GO annotations of the highlighted genes are presented in Table 2.

**Figure 12**
**Extracted subgraph indicating potential module of interest in the bigger dataset - Set 5**. The genes belonging to *Catalytic activity* category are highlighted in yellow. The GO annotations of the highlighted genes are presented in Table 2.

**Figure 13**
**Extracted subgraph indicating potential module of interest in the bigger dataset - Set 6**. The genes belonging to *Response to stress* category are highlighted in yellow. The GO annotations of the highlighted genes are presented in Table 2.

**Figure 14**
**Extracted subgraph indicating potential module of interest in the bigger dataset - Set 7**. The genes belonging to *Cell part* category are highlighted in yellow. The GO annotations of the highlighted genes are presented in Table 2.

**Figure 15**
**Structural properties of association network obtained for bigger dataset**. A) A power-law like distribution obtained for the node degree distribution. B) A distribution of number of partners shared between a pair of nodes C) Closeness centrality of all the nodes D) Plot for topological coefficient.

**Figure 16**
**Correlation matrix for smaller Arabidopsis dataset**. The association matrix obtained using Pearson correlation for the smaller Arabidopsis dataset is shown. The strengths of interactions between genes are quantified according to the color-map presented in the figure.

**Figure 17**
**Distance matrix for smaller Arabidopsis dataset**. The association matrix obtained using Euclidean distance for the smaller Arabidopsis dataset is shown. The strengths of interactions between genes are quantified according to the color-map presented in the figure.

**Figure 18**
**Subgraphs obtained by using correlation as a measure of association in the smaller Arabidopsis dataset**. Two subgraphs of potential interest were detected when correlation coefficient was used to establish association between genes in the smaller Arabidopsis dataset. The GO annotation of recognised genes are presented in Table 5.

See this image and copyright information in PMC

Cited by

Functional clustering of time series gene expression data by Granger causality.
Fujita A, Severino P, Kojima K, Sato JR, Patriota AG, Miyano S. Fujita A, et al. BMC Syst Biol. 2012 Oct 30;6:137. doi: 10.1186/1752-0509-6-137. BMC Syst Biol. 2012. PMID: 23107425 Free PMC article.
Dysregulated cellular redox status during hyperammonemia causes mitochondrial dysfunction and senescence by inhibiting sirtuin-mediated deacetylation.
Mishra S, Welch N, Karthikeyan M, Bellar A, Musich R, Singh SS, Zhang D, Sekar J, Attaway AH, Chelluboyina AK, Lorkowski SW, Roychowdhury S, Li L, Willard B, Smith JD, Hoppel CL, Vachharajani V, Kumar A, Dasarathy S. Mishra S, et al. Aging Cell. 2023 Jul;22(7):e13852. doi: 10.1111/acel.13852. Epub 2023 Apr 26. Aging Cell. 2023. PMID: 37101412 Free PMC article.
Time-series clustering of gene expression in irradiated and bystander fibroblasts: an application of FBPA clustering.
Ghandhi SA, Sinha A, Markatou M, Amundson SA. Ghandhi SA, et al. BMC Genomics. 2011 Jan 4;12:2. doi: 10.1186/1471-2164-12-2. BMC Genomics. 2011. PMID: 21205307 Free PMC article.

References

1. Kim BR, Littell RC, Wu RL. Clustering the periodic pattern of gene expression using Fourier series approximations. Curr Genomics. 2006;7:197–203. doi: 10.2174/138920206777780229. - DOI
1. Harmer SL, Hogenesch JB, Straume M, Chang HS, HB. Orchestrated transcription of key pathways in Arabidopsis by the circadian clock. Science. 2000;290:2110–2113. doi: 10.1126/science.290.5499.2110. - DOI - PubMed
1. Wichert S, Fokianos K, Strimmer K. Identifying Periodically Expressed Transcripts in Microarray Time Series Data. Bioinformatics. 2004;20:5–20. doi: 10.1093/bioinformatics/btg364. - DOI - PubMed
1. Quackenbush J. Computational analysis of microarray data. Nat Rev Genet. 2001;2(6):418–427. doi: 10.1038/35076576. - DOI - PubMed
1. Speed T. Statistical Analysis of Gene Expression Microarray Data. Chapman and Hall/CRC; 2003.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

BB/F005806/1/BB_/Biotechnology and Biological Sciences Research Council/United Kingdom

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A temporal precedence based clustering method for gene expression microarray data

Affiliation

A temporal precedence based clustering method for gene expression microarray data

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources