Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Mar 21;9(3):e91840.
doi: 10.1371/journal.pone.0091840. eCollection 2014.

Selection of higher order regression models in the analysis of multi-factorial transcription data

Affiliations

Selection of higher order regression models in the analysis of multi-factorial transcription data

Olivia Prazeres da Costa et al. PLoS One. .

Abstract

Introduction: Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control), and treatment/non-treatment with interferon-γ.

Results: We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction), alleviating (co-occurring effects are weaker than expected from the single effects), or aggravating (stronger than expected). We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes.

Conclusions: We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Interaction effects calculated by multiple linear regression.
This schematic visualization of second order linear regression models interaction effects. The diagram of the linear regression model includes two main covariates (strain H and stimulation with Γ) and their interaction covariate H∶Γ. The main covariates can assume two values (H: C57BL/6 or BALB/c; Γ: IFN-γ stimulation or no stimulation). The arrows indicate the estimated effects β. The pink and turquoise arrows reflect the aggravating or alleviating interaction effects as deviations from the additive model. A second order linear model can dissect the effects arising from two perturbations and their interaction by looking at the magnitude and significance of its regression covariates. Most importantly, the interaction covariate can indicate either an alleviating (weaker than expected from the single intervention effects) or aggravating (stronger than expected) interaction. The linear model includes two main covariates H and Γ and their interaction covariate Η∶Γ.
Figure 2
Figure 2. Schematic visualization for the interpretation of the eruption plot.
The results of two models can be compared in the eruption plot. The arrows of an eruption plot can have different sizes and directions. This scheme helps to interpret the arrow. Effect size is displayed along the x-axis and the significance on the y-axis. The red area shows the region of interest (ROI).
Figure 3
Figure 3. Eruption plot.
A: Effect size is displayed along the x-axis at log2 scale and the y-axis shows the negative log10 p-value. The vertical blue lines indicate 1.5 fold up and down-regulation and the horizontal blue line indicates a significance of 0.05 after Bonferroni adjustment. They bound the regions of biological interest (ROI), which are characterized by a sufficiently high effect, and a sufficiently low p-value. I.e., biologically interesting effects are found in the top left and the top right segment of the plot. Each gene is represented by an arrow comparing the effect size and significance estimate of a covariate (the interaction covariate H∶Γ in this case) between Model 1 (arrow tail) to Model 2 (arrow head). The details of Models 1 and 2 are given in Table 2. Black and grey arrows represent genes completely contained within ROI and excluded completely from ROI, respectively. Red and blue arrows represent genes that are located within ROI solely in Model 1 and Model 2, respectively. B: Density plot of the p-values of Model 1 (red) and Model 2 (green). The dashed lines indicate the median of each density.
Figure 4
Figure 4. Cluster and pathway analysis.
A: the effect estimates of Model 3 were subjected to a hierarchical cluster analysis. Genes are displayed in the rows, which showed a significant global effect (F-test p-value <0.05 after FDR correction and at least one of the covariates having +/−1.5 fold change). The three columns are the covariates Η, Γ, and Η∶Γ. The column strain shows differences between C57BL/6 and BALB/c, up-regulation shown in red and down-regulation shown in green. The column Γ shows in red up-regulation in BALB/c and in green down-regulation upon IFN-γ stimulation. The third column helps to distinguish alleviating and aggravating effects. Aggravating effects are represented in pink and alleviating effects in turquoise. P-values are plotted separately in a heatmap. The order of the genes is given by the effect estimate clustering. P-values are given in −log10 scale and start from 0 displayed in colors ranging from blue to white. B: The results of a pathway enrichment analysis of cluster 6 as a bar plot. The direction of regulation of the genes of cluster 6 is indicated by the color bar. Gene Ontology ‘Biological Process’ terms and KEGG pathway categories (p<0.01) are sorted from bottom (most significant) to top. To reduce redundancy, similar terms are represented by the most significant and specific term. For complete list of functional annotations see Table S2. The right side shows the results of a TFBS analysis of this gene cluster. The two most significantly represented TFBS are given by the name of the transcription factor, the motif, and the p-value.

Similar articles

References

    1. Smyth GK (2004) Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3: Article3. - PubMed
    1. Pan X, Ye P, Yuan DS, Wang X, Bader JS, et al. (2006) A DNA integrity network in the yeast Saccharomyces cerevisiae. Cell 124: 1069–1081. - PubMed
    1. Collins SR, Miller KM, Maas NL, Roguev A, Fillingham J, et al. (2007) Functional dissection of protein complexes involved in yeast chromosome biology using a genetic interaction map. Nature 446: 806–810. - PubMed
    1. Dumcke S, Seizl M, Etzold S, Pirkl N, Martin DE, et al. (2012) One Hand Clapping: detection of condition-specific transcription factor interactions from genome-wide gene activity data. Nucleic Acids Res 40: 8883–8892. - PMC - PubMed
    1. Hummel M, Meister R, Mansmann U (2008) GlobalANCOVA: exploration and assessment of gene group effects. Bioinformatics 24: 78–85. - PubMed

Publication types

Substances