Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Oct 21:3:129-40.
doi: 10.4137/bbi.s3445.

Inferring Transcriptional Interactions by the Optimal Integration of ChIP-chip and Knock-out Data

Affiliations

Inferring Transcriptional Interactions by the Optimal Integration of ChIP-chip and Knock-out Data

Haoyu Cheng et al. Bioinform Biol Insights. .

Abstract

How to combine heterogeneous data sources for reliable prediction of transcriptional regulation is a challenge. Here we present an easy but powerful method to integrate Chromatin immunoprecipitation (ChIP)-chip and knock-out data. Since these two types of data provide complementary (physical and functional) information about transcription, the method combining them is expected to achieve high detection rates and very low false positive rates. We try to seek the optimal integration of these two data using hyper-geometric distribution. We evaluate our method on yeast data and compare our predictions with YEASTRACT, high-quality ChIP-chip data, and literature. The results show that even using low-quality ChIP-chip data, our method uncovers more relations than those inferred before from high-quality data. Furthermore our method achieves a low false positive rate. We find experimental and computational evidence in literature for most transcription factor (TF)-gene relations uncovered by our method.

Keywords: ChIP; P-value threshold; cooperativity; hypergeometric distribution; knock-out data; regulatory interaction; transcription factor.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Schematic diagram of the method. The starting point for this method depends on ChIP binding data and TF knockout data (the data sources showed on the left). For each TF, two thresholds are selected for the ChIP binding data and TF deletion data, respectively. When the binding P value of a single gene is less than the binding threshold, this gene is considered to be the binding target. Similarly, if the effectual P value of a single gene in a deletion experiment is less than its assigned threshold, then this gene is defined as the affected target. Both of the two thresholds are set in the range from 0.001 to 0.05 with an increment of 0.001. A value called overlapping significance is calculated based on the binding target set, the affected target set and the intersection of them (the intersecting ovals in the middle). This process is reiterated for all possible combinations of thresholds so that the maximal overlapping significance is obtained (procedures and formulas are showed on the right).
Figure 2.
Figure 2.
Comparison with YEASTARCT. For the 30 TFs, number of the target genes identified with the stringent P-value threshold pair (Pbt = 0.001, Pet = 0.001) (blue), number of the target genes inferred with the optimal threshold pair (Pb*t, Pe*t) by our method (green), and the number of our predictions supported in YEASTARCT are shown (red).
Figure 3.
Figure 3.
Comparison with high-quality ChIP-chip data. Oval nodes are for genes identified with stringent P-value cutoffs (Pbt = 0.001, Pet = 0.001), while rectangular nodes are for additional genes identified using optimal relaxed threshold pair by our method. Nodes with red solid border are for relations supported by YEASTRACT, otherwise with black dash border. Solid nodes are for the genes supported by high-quality ChIP-chip data. A) 126 identified target genes of RAP1. 56 additional target genes are identified (rectangular), while 51 (rectangular with red solid border) are supported by YEASTRACT and 34 (solid rectangular) are supported by high-quality ChIP-chip data. B) We have identified 48 target genes of SWI4 including SWI4 itself. SWI4-SWI4 self-regulation is showed by the arrow pointed back to SWI4 itself in the figure. Among SWI4 and other 37 additional target genes identified using optimal relaxed threshold pair by our method (rectangular), as many as 28 (rectangular with red solid border) and SWI4 are supported by YEASTRACT; 16 (solid rectangular) and SWI4 are supported by high-quality ChIP-chip data.

Similar articles

Cited by

References

    1. Friedman N, Linial M, Nachman I, Pe’er D. Using Bayesian networks to analyze expression data. J Comput Biol. 2000;7:3–4. 601–20. - PubMed
    1. Ideker TE, Thorsson V, Karp RM. Discovery of regulatory interactions through perturbation: inference and experimental design. Pac Symp Biocomput. 2000:305–16. - PubMed
    1. Birnbaum K, Benfey PN, Shasha DE. cis element/transcription factor analysis (cis/TF): a method for discovering transcription factor/cis element relationships. Genome Res. 2001 Sep;11(9):1567–73. - PMC - PubMed
    1. Zhu Z, Pilpel Y, Church GM. Computational identification of transcription factor binding sites via a transcription-factor-centric clustering (TFCC) algorithm. J Mol Biol. 2002 Apr 19;318(1):71–81. - PubMed
    1. Imoto S, Goto T, Miyano S. Estimation of genetic networks and functional structures between genes by using Bayesian networks and nonparametric regression. Pac Symp Biocomput. 2002:175–86. - PubMed

LinkOut - more resources