Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jan 30:11:68.
doi: 10.1186/1471-2105-11-68.

A temporal precedence based clustering method for gene expression microarray data

Affiliations

A temporal precedence based clustering method for gene expression microarray data

Ritesh Krishna et al. BMC Bioinformatics. .

Abstract

Background: Time-course microarray experiments can produce useful data which can help in understanding the underlying dynamics of the system. Clustering is an important stage in microarray data analysis where the data is grouped together according to certain characteristics. The majority of clustering techniques are based on distance or visual similarity measures which may not be suitable for clustering of temporal microarray data where the sequential nature of time is important. We present a Granger causality based technique to cluster temporal microarray gene expression data, which measures the interdependence between two time-series by statistically testing if one time-series can be used for forecasting the other time-series or not.

Results: A gene-association matrix is constructed by testing temporal relationships between pairs of genes using the Granger causality test. The association matrix is further analyzed using a graph-theoretic technique to detect highly connected components representing interesting biological modules. We test our approach on synthesized datasets and real biological datasets obtained for Arabidopsis thaliana. We show the effectiveness of our approach by analyzing the results using the existing biological literature. We also report interesting structural properties of the association network commonly desired in any biological system.

Conclusions: Our experiments on synthesized and real microarray datasets show that our approach produces encouraging results. The method is simple in implementation and is statistically traceable at each step. The method can produce sets of functionally related genes which can be further used for reverse-engineering of gene circuits.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Inferred network for Dataset 1. The network structure inferred after applying Granger causality test on the synthetic dataset 1.
Figure 2
Figure 2
Inferred network for Dataset 2. The network structure inferred after applying Granger causality test on the synthetic dataset 2.
Figure 3
Figure 3
Inferred network for Dataset 3. The network structure inferred after applying Granger causality test on the synthetic dataset 3.
Figure 4
Figure 4
Simulation results with Dataset 1, 2 and 3 integrated into one system. The association graph obtained after applying the Granger causality test on the combined dataset is represented in form of a association matrix. We can see three distinct island like modules in the graph, each module representing a dataset.
Figure 5
Figure 5
Temporal profiles of genes selected for smaller dataset for Arabidopsis. The temporal profiles of the genes selected to constitute the smaller Arabidopsis dataset is shown. A) Genes annotated for circadian activity B) Genes annotated for death and C) Gene annotated for Ageing.
Figure 6
Figure 6
Degree sorted network structure. The association graph obtained after applying Granger causality test is displayed in a degree sorted manner.
Figure 7
Figure 7
Extracted subgraphs indicating potential modules of interest in the smaller dataset. The biological functions performed by modules in respective figures are A.) Circadian rhythm B.) Immune and Defense response C.) Circadian rhythm and D.) Aging. The GO annotations for the genes can be seen in Table 1.
Figure 8
Figure 8
Extracted subgraph indicating potential module of interest in the bigger dataset - Set 1. The genes belonging to Response to stress category are highlighted in yellow. The GO annotations of the highlighted genes are presented in Table 2.
Figure 9
Figure 9
Extracted subgraph indicating potential module of interest in the bigger dataset - Set 2. The genes belonging to Cytoplasmic part category are highlighted in yellow. The GO annotations of the highlighted genes are presented in Table 2.
Figure 10
Figure 10
Extracted subgraph indicating potential module of interest in the bigger dataset - Set 3. The genes belonging to Response to stimulus category are highlighted in yellow. The GO annotations of the highlighted genes are presented in Table 2.
Figure 11
Figure 11
Extracted subgraph indicating potential module of interest in the bigger dataset - Set 4. The genes belonging to Response to abiotic stimulus category are highlighted in yellow. The GO annotations of the highlighted genes are presented in Table 2.
Figure 12
Figure 12
Extracted subgraph indicating potential module of interest in the bigger dataset - Set 5. The genes belonging to Catalytic activity category are highlighted in yellow. The GO annotations of the highlighted genes are presented in Table 2.
Figure 13
Figure 13
Extracted subgraph indicating potential module of interest in the bigger dataset - Set 6. The genes belonging to Response to stress category are highlighted in yellow. The GO annotations of the highlighted genes are presented in Table 2.
Figure 14
Figure 14
Extracted subgraph indicating potential module of interest in the bigger dataset - Set 7. The genes belonging to Cell part category are highlighted in yellow. The GO annotations of the highlighted genes are presented in Table 2.
Figure 15
Figure 15
Structural properties of association network obtained for bigger dataset. A) A power-law like distribution obtained for the node degree distribution. B) A distribution of number of partners shared between a pair of nodes C) Closeness centrality of all the nodes D) Plot for topological coefficient.
Figure 16
Figure 16
Correlation matrix for smaller Arabidopsis dataset. The association matrix obtained using Pearson correlation for the smaller Arabidopsis dataset is shown. The strengths of interactions between genes are quantified according to the color-map presented in the figure.
Figure 17
Figure 17
Distance matrix for smaller Arabidopsis dataset. The association matrix obtained using Euclidean distance for the smaller Arabidopsis dataset is shown. The strengths of interactions between genes are quantified according to the color-map presented in the figure.
Figure 18
Figure 18
Subgraphs obtained by using correlation as a measure of association in the smaller Arabidopsis dataset. Two subgraphs of potential interest were detected when correlation coefficient was used to establish association between genes in the smaller Arabidopsis dataset. The GO annotation of recognised genes are presented in Table 5.

Similar articles

Cited by

References

    1. Kim BR, Littell RC, Wu RL. Clustering the periodic pattern of gene expression using Fourier series approximations. Curr Genomics. 2006;7:197–203. doi: 10.2174/138920206777780229. - DOI
    1. Harmer SL, Hogenesch JB, Straume M, Chang HS, HB. Orchestrated transcription of key pathways in Arabidopsis by the circadian clock. Science. 2000;290:2110–2113. doi: 10.1126/science.290.5499.2110. - DOI - PubMed
    1. Wichert S, Fokianos K, Strimmer K. Identifying Periodically Expressed Transcripts in Microarray Time Series Data. Bioinformatics. 2004;20:5–20. doi: 10.1093/bioinformatics/btg364. - DOI - PubMed
    1. Quackenbush J. Computational analysis of microarray data. Nat Rev Genet. 2001;2(6):418–427. doi: 10.1038/35076576. - DOI - PubMed
    1. Speed T. Statistical Analysis of Gene Expression Microarray Data. Chapman and Hall/CRC; 2003.

MeSH terms

LinkOut - more resources