Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Oct 13:11:510.
doi: 10.1186/1471-2105-11-510.

Functional analysis: evaluation of response intensities--tailoring ANOVA for lists of expression subsets

Affiliations

Functional analysis: evaluation of response intensities--tailoring ANOVA for lists of expression subsets

Fabrice Berger et al. BMC Bioinformatics. .

Abstract

Background: Microarray data is frequently used to characterize the expression profile of a whole genome and to compare the characteristics of that genome under several conditions. Geneset analysis methods have been described previously to analyze the expression values of several genes related by known biological criteria (metabolic pathway, pathology signature, co-regulation by a common factor, etc.) at the same time and the cost of these methods allows for the use of more values to help discover the underlying biological mechanisms.

Results: As several methods assume different null hypotheses, we propose to reformulate the main question that biologists seek to answer. To determine which genesets are associated with expression values that differ between two experiments, we focused on three ad hoc criteria: expression levels, the direction of individual gene expression changes (up or down regulation), and correlations between genes. We introduce the FAERI methodology, tailored from a two-way ANOVA to examine these criteria. The significance of the results was evaluated according to the self-contained null hypothesis, using label sampling or by inferring the null distribution from normally distributed random data. Evaluations performed on simulated data revealed that FAERI outperforms currently available methods for each type of set tested. We then applied the FAERI method to analyze three real-world datasets on hypoxia response. FAERI was able to detect more genesets than other methodologies, and the genesets selected were coherent with current knowledge of cellular response to hypoxia. Moreover, the genesets selected by FAERI were confirmed when the analysis was repeated on two additional related datasets.

Conclusions: The expression values of genesets are associated with several biological effects. The underlying mathematical structure of the genesets allows for analysis of data from several genes at the same time. Focusing on expression levels, the direction of the expression changes, and correlations, we showed that two-step data reduction allowed us to significantly improve the performance of geneset analysis using a modified two-way ANOVA procedure, and to detect genesets that current methods fail to detect.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustration of the effect of Z-value data reduction on a variance analysis with two classification criteria. The p-values associated with the effect of the condition studied and the intersection are compared before and after Z reduction. After this step, the effect associated with the probeset is null (data not shown). Prior to Z reduction, the probesets are expressed at variable levels. After reduction, the expression level is standardized for all of the probesets and their individual contributions are balanced during the variance analysis. The genesets analyzed are distributed differently and reveal a more pronounced effect of the condition and/or of the interaction between the condition and the probesets. Put differently, both the strength and variability of the individual answer are revealed by this step.
Figure 2
Figure 2
Illustration of the distribution of the F statistic evaluated compared with geneset size using the ANOVA-2 procedure on the initial expression data (left panel), on the standardized data (center) and on standardized and unidirectional data (right, FAERI procedure). The graphs in the upper part are generated from random data and show that the directional reduction step induces dependence on the number of members in the geneset. The graphs in the lower part show results obtained from real data (E-GEOD-7479), and illustrate the impact of the standardization of data relative to each probeset as well as dependence on the number of members following the directional reduction step.
Figure 3
Figure 3
Illustration of the logarithm of the p-values obtained by ANOVA-2 (left), FAERI based on random data (center) or permutations (right), versus the number of members in the geneset (real dataset E-GEOD-7479). The graphs presented in the center and on the right show that the two procedures to evaluate the significance of the FAERI test give p-values dependant on geneset size.
Figure 4
Figure 4
Comparison of the logarithm of the p-values obtained by FAERI based on random data or permutations. The left graph shows the comparison of the p-values obtained during analysis of simulated data. The right graph shows results obtained when analyzing real data (E-GEOD-7479), illustrating that the null distribution evaluated by the two procedures is different in the case of real data, but, nonetheless, that part of the genesets present a similar p-value (diagonally).

Similar articles

Cited by

References

    1. Hatfield GW, Hung SP, Baldi P. Differential analysis of DNA microarray gene expression data. Mol Microbiol. 2003;47:871–877. doi: 10.1046/j.1365-2958.2003.03298.x. - DOI - PubMed
    1. Trajkovski I, Lavrac N, Tolar J. SEGS: search for enriched gene sets in microarray data. J Biomed Inform. 2008;41:588–601. doi: 10.1016/j.jbi.2007.12.001. - DOI - PubMed
    1. Watson M. CoXpress: differential co-expression in gene expression data. BMC Bioinformatics. 2006;7:509. doi: 10.1186/1471-2105-7-509. - DOI - PMC - PubMed
    1. Altman RB, Raychaudhuri S. Whole-genome expression analysis: challenges beyond clustering. Curr Opin Struct Biol. 2001;11:340–347. doi: 10.1016/S0959-440X(00)00212-8. - DOI - PubMed
    1. Kim SY, Kim YS. A gene sets approach for identifying prognostic gene signatures for outcome prediction. BMC Genomics. 2008;9:177. doi: 10.1186/1471-2164-9-177. - DOI - PMC - PubMed

Publication types

MeSH terms