Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jun 15;29(12):1541-52.
doi: 10.1093/bioinformatics/btt186. Epub 2013 Apr 22.

A multi-layer inference approach to reconstruct condition-specific genes and their regulation

Affiliations

A multi-layer inference approach to reconstruct condition-specific genes and their regulation

Ming Wu et al. Bioinformatics. .

Abstract

An important topic in systems biology is the reverse engineering of regulatory mechanisms through reconstruction of context-dependent gene networks. A major challenge is to identify the genes and the regulations specific to a condition or phenotype, given that regulatory processes are highly connected such that a specific response is typically accompanied by numerous collateral effects. In this study, we design a multi-layer approach that is able to reconstruct condition-specific genes and their regulation through an integrative analysis of large-scale information of gene expression, protein interaction and transcriptional regulation (transcription factor-target gene relationships). We establish the accuracy of our methodology against synthetic datasets, as well as a yeast dataset. We then extend the framework to the application of higher eukaryotic systems, including human breast cancer and Arabidopsis thaliana cold acclimation. Our study identified TACSTD2 (TROP2) as a target gene for human breast cancer and discovered its regulation by transcription factors CREB, as well as NFkB. We also predict KIF2C is a target gene for ER-/HER2- breast cancer and is positively regulated by E2F1. The predictions were further confirmed through experimental studies.

Availability: The implementation and detailed protocol of the layer approach is available at http://www.egr.msu.edu/changroup/Protocols/Three-layer%20approach%20 to % 20reconstruct%20condition.html.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Application of ReliefF on the integrated dataset with multiple conditions. (A) The data are a simulated dataset with five conditions (conditions A–E) plus a reference condition (ref). The goal is to identify the genes that change uniquely for a given condition (condition A). Traditional approach compares condition A with the reference condition, whereas we suggest applying ReliefF on condition A against all the other conditions that are available. (B) The ROC curves for identification of specific genes for the condition of interest (condition A). We compare two different approaches: traditional approach based on t-test and the ReliefF algorithm; under two different scenarios: with only condition A and reference control, and integrating all the conditions available and comparing against condition A. The AUC (area under curve): ReliefF with multiple conditions > ReliefF with two conditions > t-test with two conditions > t-test with multiple conditions. (C–E) The ‘MEGA’ yeast microarray dataset: the samples are plotted in 2D with their first two principle components. The condition ‘AltCarb’ is the condition of interest. (C) A traditional treated/untreated analysis. (D) We plot other conditions, such as hyperoxide stimulation, heat stress, etc. The samples of these conditions are similar to samples in the ‘AltCarb’ condition as compared with the same ‘untreated’ reference samples; thus, the gene lists that are identified could be similar. (E) The integrated dataset provides better coverage of the sample space, and ReliefF compares ‘AltCarb’ samples with all other samples in different conditions to achieve better specificity. Nearest neighbors of AltCarb condition used in the ReliefF procedure are shown in green. The Principle Component Analysis (PCA) plot and ReliefF are performed in MATLAB with customized codes. (F) The score of the yeast genes provided by the ReliefF analysis correlates with the importance or relevance of the gene to the specific condition
Fig. 2.
Fig. 2.
The ROC curves for inferring TF–gene regulatory relationships. (A) The ROC curve for TF–gene relationships predicted. (B) Part of the ROC curve for the top 20% of the predictions. The prediction of TF–gene regulatory relationships are based on mutual information (MI) between TFs and their target genes. The traditional setting is MI(condition of interest) − MI(a ref condition), shown in green dotted lines, whereas we propose to use a variety of conditions as reference: MI(condition of interest) − MI(multiple conditions as refs), shown in red dotted lines. We apply the same approaches but incorporate the information of potential TRN based on binding motifs and other literature evidences (data from www.yeastract.com), shown in solid lines, green: traditional setting compared with reference condition, red: compared with a variety of conditions as reference. Further, we use the sum of the target gene expression as a feature of TF activity for a given condition and apply ReliefF to identify the TFs and genes that have distinct activity and expression profile for the condition of interest (H2O2). Those TF–gene pairs with significant changes (top 30) on both TF activity and gene expression are elevated to the top of the list of potential TF–gene regulatory relationship based on MI measurement of the multiple condition setting. The result is shown in blue solid lines
Fig. 3.
Fig. 3.
Network reconstruction of the GAL pathway. We estimate the activity of the 25 transcription factors that can bind to GAL genes in the TRN and use the top three TFs predicted to reconstruct the essential regulatory networks, with the interactions extracted from the TRN (green lines with arrow or dot at ends) and the PPI (blue lines, PPI information obtained from www.yeastgenome.org). We compare different approaches in estimating the TF activity (Wu and Chan, 2011): (A) TYPE 1: TF activity is determined by its expression level; (B) TYPE 2: TF activity is determined by the differential expression of potential target genes; (C) TYPE 3: TF activity is implicated by the co-expression of the target genes; (D) our approach: use the target gene expression information and integrate a wide range of conditions to determine the change of TF activity. The true network includes regulators GAL4, GAL80 and IMP2 (ranked 1, 2 and 3, respectively, in our approach) shown by the nodes colored in magenta, which are specific TFs regulating the GAL pathway for galactose utilization and glucose repression in the AltCarb condition. Nodes colored in gray are non-specific TFs for the AltCarb condition, including AFT1 (iron utilization and homeostasis), MSN2 (general stress response), GCN4 (amino acid biosynthesis), PIP2 (oleate response) and SFP1 (ribosome biogenesis and cell cycle). The functional annotations are based on SGD (http://www.yeastgenome.org)
Fig. 4.
Fig. 4.
The regulatory network for TROP2. Seven transcription factors (colored in yellow) are predicted to bind to TROP2 based on motif search. Their interacting proteins are colored in blue. The causal impact (score) is represented with the size of the nodes in the network. Of the transcription factors that could regulate TROP2 only CREB1 shows a causal impact (a positive score), and it has the highest score among all the proteins in the network that is connected to TROP2
Fig. 5.
Fig. 5.
The TROP2 mRNA expression levels in different cell types. MCF10A and MDA-MB-231 were treated with IKK inhibitor VII and NFκB activation inhibitor IV for 2 h, respectively. The TROP2 mRNA levels were measured by quantitative real-time PCR (n = 3). *P < 0.05, **P < 0.01, ***P < 0.001. P-value was compared with control
Fig. 6.
Fig. 6.
(A) The mRNA expression level of E2F1 in breast cancer cells MCF7 and MDA-MB-231. Scramble siRNA and siRNA targeting E2F1 were transfected into MCF7 and MDA-MB-231 cells. The mRNA of E2F1 was detected by real-time PCR (n = 3). *P < 0.05, **P < 0.01, ***P < 0.001. A line indicates comparison between the two bars connected by the line. (B) The protein expression level of E2F1 in breast cancer cells MCF7 and MDA-MB-231. Scramble siRNA and siRNA targeting E2F1were transfected into MCF7 and MDA-MB-231 cells. The protein of E2F1 was detected by western blot, actin was used as a loading control (n = 3). *P < 0.05, **P < 0.01, ***P < 0.001. A line indicates comparison between the two bars connected by the line. (C) KIF2C expression level in breast cancer MCF7 and MDA-MB-231 cells. E2F1 was silenced with specific siRNA in both MCF7 and MDA-MB-231 cells, and scramble siRNA was used as control. The mRNA of KIF2C was detected by real-time PCR (n = 3). *P < 0.05, **P < 0.01, ***P < 0.001. A line indicates comparison between the two bars connected by the line

Similar articles

Cited by

References

    1. Abdel-Fatah T, et al. P4-09-11: kinesin family member 2C (KIF2C) is a new surrogate prognostic marker in breast cancer (BC) Cancer Res. 2012;71:P4–09–11.
    1. Agarwal M, et al. A R2R3 type MYB transcription factor is involved in the cold regulation of CBF genes and in acquired freezing tolerance. J. Biol. Chem. 2006;281:37636–37645. - PubMed
    1. Almuallim H, Dietterich TG. Proceedings of The Ninth National Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press; 1991. Learning with many irrelevant features; pp. 547–552.
    1. Basso K, et al. Reverse engineering of regulatory networks in human B cells. Nat. Genet. 2005;37:382–390. - PubMed
    1. Bontempi G, Meyer PE. Causal filter selection in microarray data. ICML. 2010:95–102.

Publication types