Computing gene expression data with a knowledge-based gene clustering approach

Bruce A Rosa, Sookyung Oh, Beronda L Montgomery, Jin Chen, Wensheng Qin

PMID: 21968910
PMCID: PMC3180043

Computing gene expression data with a knowledge-based gene clustering approach

Bruce A Rosa et al. Int J Biochem Mol Biol. 2010.

. 2010;1(1):51-68.

Epub 2010 Jun 15.

Authors

Bruce A Rosa, Sookyung Oh, Beronda L Montgomery, Jin Chen, Wensheng Qin

PMID: 21968910
PMCID: PMC3180043

Abstract

Computational analysis methods for gene expression data gathered in microarray experiments can be used to identify the functions of previously unstudied genes. While obtaining the expression data is not a difficult task, interpreting and extracting the information from the datasets is challenging. In this study, a knowledge-based approach which identifies and saves important functional genes before filtering based on variability and fold change differences was utilized to study light regulation. Two clustering methods were used to cluster the filtered datasets, and clusters containing a key light regulatory gene were located. The common genes to both of these clusters were identified, and the genes in the common cluster were ranked based on their coexpression to the key gene. This process was repeated for 11 key genes in 3 treatment combinations. The initial filtering method reduced the dataset size from 22,814 probes to an average of 1134 genes, and the resulting common cluster lists contained an average of only 14 genes. These common cluster lists scored higher gene enrichment scores than two individual clustering methods. In addition, the filtering method increased the proportion of light responsive genes in the dataset from 1.8% to 15.2%, and the cluster lists increased this proportion to 18.4%. The relatively short length of these common cluster lists compared to gene groups generated through typical clustering methods or coexpression networks narrows the search for novel functional genes while increasing the likelihood that they are biologically relevant.

PubMed Disclaimer

Figures

**Figure 1.**
A flowchart describing the network filtering procedure used. “AT numbers” are gene identifiers provided by TAIR. Grey rectangles (numbered) represent major steps in the analysis, and correspond to the steps in the methods section of this paper. Black rectangles represent groups of probes or genes which were removed from the dataset. White rectangles represent groups of genes or probes which are retained in the dataset, but are subject to subsequent filtering steps. Dotted rectangles represent groups of genes which are retained in the final dataset and are not subject to further filtering.

**Figure 2.**
The frequency distribution of probe intensities for the three datasets. Black diamonds and the black line represent genes from the WT-Leaf dataset, grey circles and the grey line represent genes from the WT-Whole dataset, and stars and the dashed line represent genes from the Leaf-Whole dataset.

**Figure 3.**
The frequency distribution of fold differences for the three datasets. Black diamonds and the black line represent genes from the WTLeaf dataset, grey circles and the grey line represent genes from the WT-Whole dataset, and stars and the dashed line represent genes from the Leaf-Whole dataset.

**Figure 4.**
The frequency distribution of Pearson correlations between each gene pair in each dataset. Black diamonds and the black line represent genes from the WT-Leaf dataset, grey circles and the grey line represent genes from the WT-Whole dataset, and stars and the dashed line represent genes from the Leaf-Whole dataset.

**Figure 5.**
The frequency distribution of gene connectivity in each of the three datasets. Black diamonds and the black line represent genes from the WT-Leaf dataset, grey circles and the grey line represent genes from the WT-Whole dataset, and stars and the dashed line represent genes from the Leaf-Whole dataset.

See this image and copyright information in PMC

Cited by

Genomic Clustering of differential DNA methylated regions (epimutations) associated with the epigenetic transgenerational inheritance of disease and phenotypic variation.
Haque MM, Nilsson EE, Holder LB, Skinner MK. Haque MM, et al. BMC Genomics. 2016 Jun 1;17:418. doi: 10.1186/s12864-016-2748-5. BMC Genomics. 2016. PMID: 27245821 Free PMC article.
Downstream effectors of light- and phytochrome-dependent regulation of hypocotyl elongation in Arabidopsis thaliana.
Oh S, Warnasooriya SN, Montgomery BL. Oh S, et al. Plant Mol Biol. 2013 Apr;81(6):627-40. doi: 10.1007/s11103-013-0029-0. Epub 2013 Mar 1. Plant Mol Biol. 2013. PMID: 23456246 Free PMC article.
Phytochrome-induced SIG2 expression contributes to photoregulation of phytochrome signalling and photomorphogenesis in Arabidopsis thaliana.
Oh S, Montgomery BL. Oh S, et al. J Exp Bot. 2013 Dec;64(18):5457-72. doi: 10.1093/jxb/ert308. Epub 2013 Sep 27. J Exp Bot. 2013. PMID: 24078666 Free PMC article.

References

1. Bellazzi R, Zupan B. Towards knowledge-based gene expression data mining. J Biomed Inform. 2007;40:787–802. - PubMed
1. Mao L, Mackenzie C, Roh JH, Eraso JM, Kaplan S, Resat H. Combining microarray and genomic data to predict DNA binding motifs. Microbiology. 2005;151:3197–3213. - PubMed
1. Yu H, Luscombe NM, Qian J, Gerstein M. Genomic analysis of gene expression relationships in transcriptional regulatory networks. Trends Genet. 2003;19:422–427. - PubMed
1. Ma S, Gong Q, Bohnert HJ. An Arabidopsis gene network based on the graphical Gaussian model. Genome Res. 2007;17:1614–1625. - PMC - PubMed
1. Hand DJ, Heard NA. Finding groups in gene expression data. J Biomed Biotechnol. 2005;2005:215–225. - PMC - PubMed

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Computing gene expression data with a knowledge-based gene clustering approach

Computing gene expression data with a knowledge-based gene clustering approach

Authors

Abstract

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources