Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010;1(1):51-68.
Epub 2010 Jun 15.

Computing gene expression data with a knowledge-based gene clustering approach

Computing gene expression data with a knowledge-based gene clustering approach

Bruce A Rosa et al. Int J Biochem Mol Biol. 2010.

Abstract

Computational analysis methods for gene expression data gathered in microarray experiments can be used to identify the functions of previously unstudied genes. While obtaining the expression data is not a difficult task, interpreting and extracting the information from the datasets is challenging. In this study, a knowledge-based approach which identifies and saves important functional genes before filtering based on variability and fold change differences was utilized to study light regulation. Two clustering methods were used to cluster the filtered datasets, and clusters containing a key light regulatory gene were located. The common genes to both of these clusters were identified, and the genes in the common cluster were ranked based on their coexpression to the key gene. This process was repeated for 11 key genes in 3 treatment combinations. The initial filtering method reduced the dataset size from 22,814 probes to an average of 1134 genes, and the resulting common cluster lists contained an average of only 14 genes. These common cluster lists scored higher gene enrichment scores than two individual clustering methods. In addition, the filtering method increased the proportion of light responsive genes in the dataset from 1.8% to 15.2%, and the cluster lists increased this proportion to 18.4%. The relatively short length of these common cluster lists compared to gene groups generated through typical clustering methods or coexpression networks narrows the search for novel functional genes while increasing the likelihood that they are biologically relevant.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
A flowchart describing the network filtering procedure used. “AT numbers” are gene identifiers provided by TAIR. Grey rectangles (numbered) represent major steps in the analysis, and correspond to the steps in the methods section of this paper. Black rectangles represent groups of probes or genes which were removed from the dataset. White rectangles represent groups of genes or probes which are retained in the dataset, but are subject to subsequent filtering steps. Dotted rectangles represent groups of genes which are retained in the final dataset and are not subject to further filtering.
Figure 2.
Figure 2.
The frequency distribution of probe intensities for the three datasets. Black diamonds and the black line represent genes from the WT-Leaf dataset, grey circles and the grey line represent genes from the WT-Whole dataset, and stars and the dashed line represent genes from the Leaf-Whole dataset.
Figure 3.
Figure 3.
The frequency distribution of fold differences for the three datasets. Black diamonds and the black line represent genes from the WTLeaf dataset, grey circles and the grey line represent genes from the WT-Whole dataset, and stars and the dashed line represent genes from the Leaf-Whole dataset.
Figure 4.
Figure 4.
The frequency distribution of Pearson correlations between each gene pair in each dataset. Black diamonds and the black line represent genes from the WT-Leaf dataset, grey circles and the grey line represent genes from the WT-Whole dataset, and stars and the dashed line represent genes from the Leaf-Whole dataset.
Figure 5.
Figure 5.
The frequency distribution of gene connectivity in each of the three datasets. Black diamonds and the black line represent genes from the WT-Leaf dataset, grey circles and the grey line represent genes from the WT-Whole dataset, and stars and the dashed line represent genes from the Leaf-Whole dataset.

Similar articles

Cited by

References

    1. Bellazzi R, Zupan B. Towards knowledge-based gene expression data mining. J Biomed Inform. 2007;40:787–802. - PubMed
    1. Mao L, Mackenzie C, Roh JH, Eraso JM, Kaplan S, Resat H. Combining microarray and genomic data to predict DNA binding motifs. Microbiology. 2005;151:3197–3213. - PubMed
    1. Yu H, Luscombe NM, Qian J, Gerstein M. Genomic analysis of gene expression relationships in transcriptional regulatory networks. Trends Genet. 2003;19:422–427. - PubMed
    1. Ma S, Gong Q, Bohnert HJ. An Arabidopsis gene network based on the graphical Gaussian model. Genome Res. 2007;17:1614–1625. - PMC - PubMed
    1. Hand DJ, Heard NA. Finding groups in gene expression data. J Biomed Biotechnol. 2005;2005:215–225. - PMC - PubMed

LinkOut - more resources