Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Aug;40(15):e114.
doi: 10.1093/nar/gks543. Epub 2012 Jun 20.

i-cisTarget: an integrative genomics method for the prediction of regulatory features and cis-regulatory modules

Affiliations

i-cisTarget: an integrative genomics method for the prediction of regulatory features and cis-regulatory modules

Carl Herrmann et al. Nucleic Acids Res. 2012 Aug.

Abstract

The field of regulatory genomics today is characterized by the generation of high-throughput data sets that capture genome-wide transcription factor (TF) binding, histone modifications, or DNAseI hypersensitive regions across many cell types and conditions. In this context, a critical question is how to make optimal use of these publicly available datasets when studying transcriptional regulation. Here, we address this question in Drosophila melanogaster for which a large number of high-throughput regulatory datasets are available. We developed i-cisTarget (where the 'i' stands for integrative), for the first time enabling the discovery of different types of enriched 'regulatory features' in a set of co-regulated sequences in one analysis, being either TF motifs or 'in vivo' chromatin features, or combinations thereof. We have validated our approach on 15 co-expressed gene sets, 21 ChIP data sets, 628 curated gene sets and multiple individual case studies, and show that meaningful regulatory features can be confidently discovered; that bona fide enhancers can be identified, both by in vivo events and by TF motifs; and that combinations of in vivo events and TF motifs further increase the performance of enhancer prediction.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Flowchart of i-cisTarget. The 136K regions are scored in batch (i.e. offline) with collections of PWMs and iVEs, yielding PWM and iVE rankings respectively. An input set of genes or genomic loci is mapped to the 136K set to obtain a set of foreground sequences. The enrichment of the foreground sequences is calculated in all rankings using recovery curves and statistics. Top ranking regions for enriched features represent candidate CRMs.
Figure 2.
Figure 2.
Motif and iVE discovery in sets of genomic loci. (A) Heatmaps displaying the motifs discovered in various ChIP datasets; red indicates that the motif ranks among the top three motifs, pink that the motif has an enrichment score above the NES threshold (NES ≥ 4), and black indicates that the expected motif is not found. The grey square for da indicates that the DA motif is found with a NES of 3.9, just below our stringent threshold of 4. Note that the absence of the dl motif in the BDTNP DL dataset is likely due to an incorrect dataset (see text). (B, C) Scatterplot of NES scores for top rankings iVEs in set of bound versus unbound regions for heat-shock factor (B) and MEF2 (C). iVE directly related to the condition of the dataset (S2 cell for HSF; embryonic for MEF2) are represented by green triangles.
Figure 3.
Figure 3.
Motif and iVE discovery in gene sets. (A) Heatmaps displaying the motifs discovered in various gene sets datasets; red indicates that the motif ranks among the top three motifs, pink that the motif has an enrichment score above the NES threshold (NES ≥ 2.5), and black indicates that the expected motif is not found. (B, C) Scatterplot of NES scores for gene sets related to a mutant condition of a TF, and the corresponding ChIP dataset for this TF; the red dashed line indicates the NES threshold of 2.5 for the gene set.
Figure 4.
Figure 4.
Assessment of CRM prediction performance. Scatterplots showing the precision/recall performances of CRM prediction for the zelda gene set (A) and the proneural gene set (C); precision/recall can be summarized into a F1-score, which is shown as histograms for both datasets (B, D). Feature combinations are represented in red, iVEs in green and motifs in blue. Feature sources are abbreviated as ME for modEncode, B for BDTNP.
Figure 5.
Figure 5.
Direct TF-target regulatory interactions derived from FBbt gene sets. GRNs derived from i-cisTarget predictions on different TermLink gene sets. (A) Genes expressed in the Mushroom body (MB) and in Kenyon cells yield enriched motifs for three out of the six TFs annotated to be expressed in these cells, namely ey, EcR and Mef2. The network shows target genes of these three TFs in the MB and in Kenyon cells (genes expressed in Keynon cells are represented as darker nodes). (B–C) Similar analysis for genes expressed in cardioblasts (B) and pericardial cells (C). These networks show clear differences between cardioblasts and pericardial cells in terms of TF-target interactions. Tinman and MEF2 are involved in both networks (these TFs are expressed in both cell types, and the iVE and/or motifs are found for both sets), while the other TFs are specific for one cell type (hth, Doc2 and Antp are expressed in cardioblasts where their motifs are found enriched). For all these networks, TFs are represented by diamonds and have a thick edge when their motifs and/or iVE have been found enriched by i-cisTarget. Arrows represent an interaction that could be positive or negative. Colours represent the type of feature found representative by i-cisTarget, blue for a motif, green for an iVE (ChIP data set for that TF) and red for both at the same time. The type of edge represents whether it is a new TF–gene interaction prediction (dashed), a known ‘TF–gene’ interaction [from DroID (73)], or a known ‘genetic interaction’ (DroID).

References

    1. Park PJ. ChIP-seq: advantages and challenges of a maturing technology. Nat. Rev. Genet. 2009;10:669–680. - PMC - PubMed
    1. Song L, Zhang Z, Grasfeder LL, Boyle AP, Giresi PG, Lee B-K, Sheffield NC, Gräf S, Huss M, Keefe D, et al. Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity. Genome Res. 2011;21:1757–1767. - PMC - PubMed
    1. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 2009;10:57–63. - PMC - PubMed
    1. Roy S, Ernst J, Kharchenko PV, Kheradpour P, Negre N, Eaton ML, Landolin JM, Bristow CA, Ma L, Lin MF, et al. Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science. 2010;330:1787–1797. - PMC - PubMed
    1. Hawkins RD, Hon GC, Ren B. Next-generation genomics: an integrative approach. Nat. Rev. Genet. 2010;11:476–486. - PMC - PubMed

Publication types

Substances