Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun 24;17(6):e1009095.
doi: 10.1371/journal.pcbi.1009095. eCollection 2021 Jun.

Identifying the combinatorial control of signal-dependent transcription factors

Affiliations

Identifying the combinatorial control of signal-dependent transcription factors

Ning Wang et al. PLoS Comput Biol. .

Abstract

The effectiveness of immune responses depends on the precision of stimulus-responsive gene expression programs. Cells specify which genes to express by activating stimulus-specific combinations of stimulus-induced transcription factors (TFs). Their activities are decoded by a gene regulatory strategy (GRS) associated with each response gene. Here, we examined whether the GRSs of target genes may be inferred from stimulus-response (input-output) datasets, which remains an unresolved model-identifiability challenge. We developed a mechanistic modeling framework and computational workflow to determine the identifiability of all possible combinations of synergistic (AND) or non-synergistic (OR) GRSs involving three transcription factors. Considering different sets of perturbations for stimulus-response studies, we found that two thirds of GRSs are easily distinguishable but that substantially more quantitative data is required to distinguish the remaining third. To enhance the accuracy of the inference with timecourse experimental data, we developed an advanced error model that avoids error overestimates by distinguishing between value and temporal error. Incorporating this error model into a Bayesian framework, we show that GRS models can be identified for individual genes by considering multiple datasets. Our analysis rationalizes the allocation of experimental resources by identifying most informative TF stimulation conditions. Applying this computational workflow to experimental data of immune response genes in macrophages, we found that a much greater fraction of genes are combinatorially controlled than previously reported by considering compensation among transcription factors. Specifically, we revealed that a group of known NFκB target genes may also be regulated by IRF3, which is supported by chromatin immuno-precipitation analysis. Our study provides a computational workflow for designing and interpreting stimulus-response gene expression studies to identify underlying gene regulatory strategies and further a mechanistic understanding.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Studying GRSs with stimulus-induced TFs.
(A) Schematic of the gene regulation model: stimulus-induced TFs bind to target DNA to induce Pol II-mediated mRNA synthesis, which is followed by its processing and release from the chromatin. Nascent mRNA abundance can be described by a single ordinary differential equation (ODE). Promoter activity is described by thermodynamic models involving Hill functions. (B) Line graphs of promoter activity as a function of a single TF activity depends on the regulation strength (here indicated by 1/Kd, in black-to-blue scale). We define the regulation strengths as Strong (Kd = 0.1), Medium (Kd = 1.0), or Weak (Kd = 10), as marked in the plot. (C) Heatmaps of promoter activity as a function of logic gates (AND, OR) with varying TF1, TF2 activities as input both with Strong regulation (Kd1 = Kd2 = 0.1). (D) Enumeration of all 17 possible AND and OR logic combinations by three TFs. These may be represented by 8 triple logics when single and dual TF logics contain null regulation strengths (Kd ≫ 1) for one or two TFs (marked with grey shading). AND and OR gates are denoted with “∙” and “+” in Boolean algebra. (E) Schematic of the analysis workflow using a set of 7 perturbations with high and low TF activities, to probe 93 non-redundant, activatable GRSs (see S2 Fig), by simulating their gene expression patterns and examining those patterns by hierarchical clustering. (F) Heatmap of gene expression at 0, 15, 30, 60 min from all 93 GRSs in response to a set of 7 perturbation conditions involving 3 TFs. Here the heatmap is ordered by GRSs, their regulatory logic and TF regulation strengths (left columns). (G) The data of panel (F) with GRSs are ordered by hierarchical clustering (single linkage approach) of gene expression using the squared Euclidean distance depicted by the tree. This analysis shows that distinct combinatorial logics may give similar gene expression patterns, and that tripe AND gates with distinct regulation strengths cannot be distinguished with 7 perturbations.
Fig 2
Fig 2. The distinguishability of GRSs depends on the available set of TF perturbations.
(A) Schematic of the analysis workflow using 4 sets of 26 perturbations each with amplitude, gradient, transient, and delayed TF dynamics to produce combinatorial TF activities to probe 93 non-redundant, activatable GRSs, by simulating their gene expression patterns and examining those patterns by hierarchical clustering. (B) Heatmap of the triple AND gate GRSs (top 7 rows in (C), probed with one of the 4 sets of 26 perturbations (shown in (B)), containing amplitude modulated TF activities (amplitude, gradient) and temporally modulated TF activities (delayed, transient). The results show that amplitude modulated TF activities best suited to distinguish GRSs are (C) Number of distinguishable GRS clusters as a function of the separation threshold with indicated sets of perturbation combinations. (D) The same plot as panel C but from 1981 inducible GRSs generated by random sampling of 1000 parameter sets for each logic. All 1981 GRS are identified at the lowest separation threshold (-1) for amplitude and gradient perturbations, whereas 1971, 1975, 1976 GRS clusters are identified for high/low, transient, and delayed perturbations, respectively.
Fig 3
Fig 3. An error model for stimulus-response data that leverages timecourse information to avoid under and over-estimates of measurement uncertainty.
(A) Uncertainty sources in data simulation. We considered both biological variability and technical uncertainty. (B) Simulated gene expression data with uncertainty (see Methods). Replicate datasets for 18 of the 93 GRSs are shown here. (C) Diagram of the time-value error model. Left panel shows how observed uncertainty can be decomposed into value and temporal uncertainties. Middle panel shows how the conventional model will under- and over-estimate data uncertainty. Right panel shows that error is decomposed into value uncertainty, temporal uncertainty, and point-specific uncertainty. (D) Uncertainty estimation with the temporal-value and conventional error model. For the left column, we estimated data uncertainty from each point using Maximum Likelihood Estimation (MLE) (98% points are shown). For the middle column, we estimated the global trend as prior. By applying Bayes’ rule, we obtained posterior estimates, right column. Pearson correlations are calculated between estimated variance and ground truth variance. Top row corresponds to the conventional model, and bottom row to our error model.
Fig 4
Fig 4. Bayesian framework to parameterize model with data uncertainty.
(A) Diagram of the computational pipeline composed of data quality assessment and model parameterization. (B) Gene expression from simulated testing data and fitted model generated data. Rows correspond to the 93 GRSs in the same order as in Fig 2C. Columns corresponds to the 26 amplitude perturbation conditions at 0, 15, 30, 60 min. (C) Bar plot presenting the number of correctly identified logic gate and boxplot of estimated parameters for 93 GRSs fitted models. Percentage of estimated parameters within 2-fold of the true value are for Time-Value model, Conventional model, raw variance respectively: Kd1 95%, 91%, 70%; Kd2 91%, 84%, 70%; Kd3 91%, 90%, 71%. The mean absolute percent deviations are for Time-Value model, Conventional model, raw variance respectively: Kd1 24%, 28%, 58%; Kd2 26%, 36%, 48%; Kd3 23%, 28%, 138%.
Fig 5
Fig 5. Principles of designing the experimental perturbation studies.
(A) Schematic of the relationship between the number of replicates and perturbation conditions given a fixed budget to generate for example 52 datasets. (B) Effect of combining different perturbation sets on the identifiability (defined as the log2 likelihood ratio between the ground truth and the most similar alternative GRS) of the 93 GRSs. Testing 2 replicates for each condition, the number of perturbation sets determines GRS identifiability. Mathematically, we defined identifiability of each GRS as the lowest squared Euclidian distance between the gene expression responses of the ground truth GRS and any other GRS. (C) Comparison of the trade-off between the number of replicates and number of perturbation conditions for the identifiability of the 93 GRSs. When 52 datasets can be generated, employing more perturbation datasets is preferable to having more replicates for fewer perturbation sets.
Fig 6
Fig 6. Mapping GRSs involving NFκB, IRFs, MAPK to endotoxin-induced immune genes.
(A) Scheme of fitting models to experimental data. Gene expression data and TF activities for different stimuli were used to fit 8 possible model topologies and best likelihoods are determined; this was followed by mapping to 17 possible logics (Fig 1D), which were then determined to whether they fit the data, or not. (B) The plot of fitness (negative log likelihood) of 8 triple TF logic gates for all lipid A-induced genes identified by [19]. An arbitrary threshold to select fitted logic gates is marked as an orange dashed line. (C) The plot of fitted GRS logic gates for lipid A-induced genes identified by [19]. Only 26% of these genes fit a GRS governed by a single TF; most require GRS models that are controlled by at least two. (D) Peak expression of IRF3-/- is assessed against WT peak expression for all the lipid A-induced genes. Genes that are identified as potentially synergistically regulated by NFκB and IRFs are marked in red. The 5 previously reported IRF3-NFκB-regulated genes [19] are marked with a black circle. (E) For two example genes, heatmaps of experimentally measured caRNA-seq time-course data and simulation data by the best-fit GRS models, ordered by fit quality from top (best) to bottom. (F) IGV genome browser tracks of IRF3 and RelA ChIP-seq data in resting and 60 min lipid A-stimulated macrophages of the two genes.

References

    1. Beer MA, Tavazoie S. Predicting gene expression from sequence. Cell. 2004. Apr 16;117(2):185–98. doi: 10.1016/s0092-8674(04)00304-6 . - DOI - PubMed
    1. Salleh FH, Arif SM, Zainudin S, Firdaus-Raih M. Reconstructing gene regulatory networks from knock-out data using Gaussian Noise Model and Pearson Correlation Coefficient. Computational biology and chemistry. 2015. Dec 1;59:3–14. Epub 2015 Jun 17. doi: 10.1016/j.compbiolchem.2015.04.012 . - DOI - PubMed
    1. Zhang X, Zhao XM, He K, Lu L, Cao Y, Liu J, et al.. Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information. Bioinformatics. 2012. Jan 1;28(1):98–104. Epub 2011 Nov 15. doi: 10.1093/bioinformatics/btr626 . - DOI - PubMed
    1. Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, et al.. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC bioinformatics 2006. Mar (Vol. 7, No. 1, pp. 1–15). BioMed Central. doi: 10.1186/1471-2105-7-S1-S7 ; PMCID: PMC1810318. - DOI - PMC - PubMed
    1. Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, et al.. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS biol. 2007. Jan 9;5(1):e8. doi: 10.1371/journal.pbio.0050008 ; PMCID: PMC1764438. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources