Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Jun 16;2(6):e70.
doi: 10.1371/journal.pcbi.0020070. Epub 2006 Jun 16.

Integrated assessment and prediction of transcription factor binding

Affiliations

Integrated assessment and prediction of transcription factor binding

Andreas Beyer et al. PLoS Comput Biol. .

Abstract

Systematic chromatin immunoprecipitation (chIP-chip) experiments have become a central technique for mapping transcriptional interactions in model organisms and humans. However, measurement of chromatin binding does not necessarily imply regulation, and binding may be difficult to detect if it is condition or cofactor dependent. To address these challenges, we present an approach for reliably assigning transcription factors (TFs) to target genes that integrates many lines of direct and indirect evidence into a single probabilistic model. Using this approach, we analyze publicly available chIP-chip binding profiles measured for yeast TFs in standard conditions, showing that our model interprets these data with significantly higher accuracy than previous methods. Pooling the high-confidence interactions reveals a large network containing 363 significant sets of factors (TF modules) that cooperate to regulate common target genes. In addition, the method predicts 980 novel binding interactions with high confidence that are likely to occur in so-far untested conditions. Indeed, using new chIP-chip experiments we show that predicted interactions for the factors Rpn4p and Pdr1p are observed only after treatment of cells with methyl-methanesulfonate, a DNA-damaging agent. We outline the first approach for consistently integrating all available evidences for TF-target interactions and we comprehensively identify the resulting TF module hierarchy. Prioritizing experimental conditions for each factor will be especially important as increasing numbers of chIP-chip assays are performed in complex organisms such as humans, for which "standard conditions" are ill defined.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Identifying High-Confidence TF–Target Interactions and TF Modules
Different lines of evidence indicative of TF–target interactions are combined to yield an integrated probabilistic measure of interaction propensity. Using a positive and a negative validation set, the input evidences are independently converted into LLSs. Individual LLSs are integrated into one value per TF–target pair. TF modules are identified as subsets of TFs that regulate common genes.
Figure 2
Figure 2. Regression Lines Used for Scaling the Different Evidence Types Needed for Predicting TF–Target Interactions
TF–target pairs were binned according to the value of the respective evidence type, and the LS for each bin was calculated using the validation sets (Equation 1). Each point is the average of five runs with different negative validation sets (the positive set was always the same). Error bars represent standard deviations over the validation sets. Gray diamonds lie in parameter ranges that were excluded from the LS prediction because the LSs were not significant. Abbreviations used in the x-axis labels are explained in the main text.
Figure 3
Figure 3. Quality Assessment of the Predicted TF–Target Gene Interactions
(A) ROC curves are average of two cross-validations (see Materials and Methods). Lines show specificity and sensitivity accounting for binding evidence only and for integrating all evidences based on the Bayesian approach (with and without [“Bayes sum”] additional filtering). Additional filtering requires that at least two evidences have LLS > 0.5 (see Materials and Methods). Single points refer to previous selections [11] based on binding evidence (chIP-chip, pb < 0.001, pb < 0.005) and motif presence in zero, two, or three yeast species, respectively. Blue arrows indicate the respective LLS thresholds. (B) Target gene sets were validated against Gene Ontology categories taken from SGD [31] and clusters of coexpressed genes (see Materials and Methods). In the latter case, all evidences based on expression data were excluded when assigning TFs to targets. The vertical bars indicate the fractions of TFs or TF modules for which the target genes significantly overlap with at least one category or cluster (p < 10−4, hypergeometric distribution). The filtering criteria for the three sets of predicted interactions were chosen such that all selections have the same specificity (0.995). Yellow indicates using binding p-values as the sole selection criterion; green, selections by Harbison et al. [11] based on binding motifs conserved in at least three species and with binding p-values (pb) < 0.005; and blue, combining all possible lines of evidence; at least two predicted LLSs must be > 0.5; the sum of all evidences must yield a LLS > 5. All modules are significant with p mod < 10−4, except for the light and dark blue bars (p mod < 0.1 and < 10−6, respectively). The p mod does not apply to the single TFs. (C) LLSs were determined based on all evidences, but excluding binding under nonstandard conditions. The average LLS (sliding window) is plotted versus binding p-values under nonstandard conditions. Blue line indicates all TF–target pairs; red line, subset excluding pairs binding under standard conditions (i.e., LLS is exclusively based on evidences other than binding). Horizontal lines indicate global average LLS (solid lines) and average plus one standard deviation (dashed lines).
Figure 4
Figure 4. Combinatorial Regulation by TF Modules
(A) Bars show average centrality (see Materials and Methods) of target genes overlapping with stress-related clusters (± standard error). Values above bars are numbers of overlapping target genes. Generic TFs such as Yap1p or Swi6p are reused in several modules. Combination with other TFs yields specificity (i.e., a smaller number of target genes and an increased centrality). (B) Hierarchy of TF modules. Arrows represent a subset relationship (i.e., all TFs of the source module are contained in the target module). Downstream TF modules always share their targets with upstream TF modules. Annotations are based on significant (p < 10−4, hypergeometric distribution) overlaps between the target gene sets and the respective functional category. Values in parentheses are numbers of target genes (black) and numbers of overlapping genes (red, green). (C) Complete hierarchy of the 363 significant TF modules (p mod < 10−4). Highlighted regions contain TF modules that are enriched with the respective TFs or TF complexes.
Figure 5
Figure 5. Rpn4p and Pdr1p Binding under Normal and Stress Conditions (H2O2 and MMS)
Binding p-values for MMS (this study) and other conditions (taken from [11]) are shown for groups of (A) Rpn4p and (B) Pdr1p targets (LLS > 5) with coherent binding patterns (red, strong binding; black, no binding). Additional transcription factors coregulating a significant (p < 0.001) number of genes either as individual TFs or as members of TF modules are listed below each cluster.

References

    1. Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, et al. Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell. 2000;11:4241–4257. - PMC - PubMed
    1. Pilpel Y, Sudarsanam P, Church GM. Identifying regulatory networks by combinatorial analysis of promotor elements. Nat Genet. 2001;29:153–159. - PubMed
    1. Garten Y, Kaplan S, Pilpel Y. Extraction of transcription regulatory signals from genome-wide DNA-protein interaction data. Nucleic Acids Res. 2005;33:605–615. - PMC - PubMed
    1. Bar-Joseph Z, Gerber GK, Lee TI, Rinaldi NJ, Yoo JY, et al. Computational discovery of gene modules and regulatory networks. Nat Biotechnol. 2003;21:1337–1342. - PubMed
    1. Reményi A, Scholer HR, Wilmanns M. Combinatorial control of gene expression. Nat Struct Mol Biol. 2004;11:812–815. - PubMed

Publication types

Substances

LinkOut - more resources