Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Nov 2:8:427.
doi: 10.1186/1471-2105-8-427.

Extracting gene expression patterns and identifying co-expressed genes from microarray data reveals biologically responsive processes

Affiliations

Extracting gene expression patterns and identifying co-expressed genes from microarray data reveals biologically responsive processes

Jeff W Chou et al. BMC Bioinformatics. .

Abstract

Background: A common observation in the analysis of gene expression data is that many genes display similarity in their expression patterns and therefore appear to be co-regulated. However, the variation associated with microarray data and the complexity of the experimental designs make the acquisition of co-expressed genes a challenge. We developed a novel method for Extracting microarray gene expression Patterns and Identifying co-expressed Genes, designated as EPIG. The approach utilizes the underlying structure of gene expression data to extract patterns and identify co-expressed genes that are responsive to experimental conditions.

Results: Through evaluation of the correlations among profiles, the magnitude of variation in gene expression profiles, and profile signal-to-noise ratio's, EPIG extracts a set of patterns representing co-expressed genes. The method is shown to work well with a simulated data set and microarray data obtained from time-series studies of dauer recovery and L1 starvation in C. elegans and after ultraviolet (UV) or ionizing radiation (IR)-induced DNA damage in diploid human fibroblasts. With the simulated data set, EPIG extracted the appropriate number of patterns which were more stable and homogeneous than the set of patterns that were determined using the CLICK or CAST clustering algorithms. However, CLICK performed better than EPIG and CAST with respect to the average correlation between clusters/patterns of the simulated data. With real biological data, EPIG extracted more dauer-specific patterns than CLICK. Furthermore, analysis of the IR/UV data revealed 18 unique patterns and 2661 genes out of approximately 17,000 that were identified as significantly expressed and categorized to the patterns by EPIG. The time-dependent patterns displayed similar and dissimilar responses between IR and UV treatments. Gene Ontology analysis applied to each pattern-related subset of co-expressed genes revealed underlying biological processes affected by IR- and/or UV- induced DNA damage.

Conclusion: EPIG competed with CLICK and performed better than CAST in extracting patterns from simulated data. EPIG extracted more biological informative patterns and co-expressed genes from both C. elegans and IR/UV-treated human fibroblasts. Using Gene Ontology analysis of the genes in the patterns extracted by EPIG, several key biological categories related to p53-dependent cell cycle control were revealed from the IR/UV data. Among them were mitotic cell cycle, DNA replication, DNA repair, cell cycle checkpoint, and G0-like status transition. EPIG can be applied to data sets from a variety of experimental designs.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Six probability distribution profiles. Plot of the six probability distribution profiles in terms of mean values and standard deviations given in Table 1. In each of the figures from (A) to (F), there are four data points marked as crosses. The four data points from left to right correspond to inter-group 1 to 4, respectively. The labels of the vertical axis indicate the mean values of the data points. The vertical bars are the standard deviation of 0.4 to each of the mean values.
Figure 2
Figure 2
Plot of first three components of a PCA using 90 simulated profiles. The six clusters, from A to F, labelled in different colors correspond to the distributions from A to F in table 1. Each of the clusters consists of 15 profiles generated. 84.3% of the variability in the data was captured by the first 3 principal components (PCs). The x-axis is PC1, the y-axis PC2 and the z-axis PC3.
Figure 3
Figure 3
Patterns of the simulated data extracted by EPIG and CLICK. The four inter-groups (red, green, blue and black) from left to right in each pattern correspond to the inter-groups from 1 to 4 shown in Table 1. A) The patterns extracted by EPIG are labelled from 1 to 5 correspond to the distributions A to E, respectively. All profiles were categorized to their respective pattern. B) The pattern extracted by CLICK from Cluster 1 with 32 profiles assigned to it appears to have emerged from both distributions C and D in Table 1. The patterns for Clusters 2 and 3 correspond to distributions A and B in Table 1. The two clusters have 16 and 15 profiles assigned respectively.
Figure 4
Figure 4
The patterns extracted by EPIG from the combined UV- and IR- treated data. In each of these patterns, 1 to 18, the first half with open circles were UV-treated and the second half with solid circles were IR-treated. For each treatment, there were three individual cell lines, F1-HTERT, F3-HTERT and F10-HTERT, positioned from left to right. Each cell line consisted of eight data points with four different treatment conditions, i.e., sham-treatment and 2, 6, and 24 h post-treatment colored red, green, blue and magenta, respectively. The vertical axes with zero at the middle are the changes in gene expression (log2 intensity) relative to the sham-treated controls.
Figure 5
Figure 5
Heat Map of the 2661 genes selected by EPIG. From top to bottom are the 2661 genes selected by EPIG listed in an order from Pattern 1 to 18. The left half is UV-treated and the right half is IR-treated. For each treatment, three individual cell lines, F1-HTERT, F3-HTERT and F10-HTERT, are positioned from left to right. Each cell line consisted of four different treatment conditions, sham-treatment, 2, 6, and 24 h post-treatment from left to right. Red and green colors correspond to up and down regulation, respectively, with a darker color denoting less differential expression.
Figure 6
Figure 6
Optimization of the Mt value. Cluster size threshold Mt (the horizontal axis) verses average of patterns' SNR (A) and number of extracted patterns (B).

References

    1. Eisen MB, Brown PO. DNA arrays for analysis of gene expression. Methods Enzymol. 1999;303:179–205. - PubMed
    1. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998;95:14863–14868. doi: 10.1073/pnas.95.25.14863. - DOI - PMC - PubMed
    1. Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM. Systematic determination of genetic network architecture. Nat Genet. 1999;22:281–285. doi: 10.1038/10343. - DOI - PubMed
    1. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci U S A. 1999;96:2907–2912. doi: 10.1073/pnas.96.6.2907. - DOI - PMC - PubMed
    1. Ben-Dor A, Shamir R, Yakhini Z. Clustering gene expression patterns. J Comput Biol. 1999;6:281–297. doi: 10.1089/106652799318274. - DOI - PubMed

Publication types

MeSH terms