Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 23;34(12):2319-2334.
doi: 10.1101/gr.279037.124.

Binding profiles for 961 Drosophila and C. elegans transcription factors reveal tissue-specific regulatory relationships

Affiliations

Binding profiles for 961 Drosophila and C. elegans transcription factors reveal tissue-specific regulatory relationships

Michelle Kudron et al. Genome Res. .

Abstract

A catalog of transcription factor (TF) binding sites in the genome is critical for deciphering regulatory relationships. Here, we present the culmination of the efforts of the modENCODE (model organism Encyclopedia of DNA Elements) and modERN (model organism Encyclopedia of Regulatory Networks) consortia to systematically assay TF binding events in vivo in two major model organisms, Drosophila melanogaster (fly) and Caenorhabditis elegans (worm). These data sets comprise 605 TFs identifying 3.6 M sites in the fly and 356 TFs identifying 0.9 M sites in the worm, and represent the majority of the regulatory space in each genome. We demonstrate that TFs associate with chromatin in clusters termed "metapeaks," that larger metapeaks have characteristics of high-occupancy target (HOT) regions, and that the importance of consensus sequence motifs bound by TFs depends on metapeak size and complexity. Combining ChIP-seq data with single-cell RNA-seq data in a machine-learning model identifies TFs with a prominent role in promoting target gene expression in specific cell types, even differentiating between parent-daughter cells during embryogenesis. These data are a rich resource for the community that should fuel and guide future investigations into TF function. To facilitate data accessibility and utility, all strains expressing green fluorescent protein (GFP)-tagged TFs are available at the stock centers for each organism. The chromatin immunoprecipitation sequencing data are available through the ENCODE Data Coordinating Center, GEO, and through a direct interface that provides rapid access to processed data sets and summary analyses, as well as widgets to probe the cell-type-specific TF-target relationships.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Peaks and metapeaks. (A,B) Number of peaks per experiment (A) for the worm and (B) for the fly. Vertical lines denote the median number of peaks in worm (1130) and fly (4416). (C,D) The total number of peaks in metapeaks declines rapidly with increasing metapeak occupancy (the number of peaks in an individual metapeak) in both worm (C) and fly (D). The vertical bars indicate the thresholds used to define HOT (left) and ultra-HOT (right) sites in each species (84 and 240 peaks, respectively, in the worm and 277 and 602 peaks in the fly). (E,F) In worm (E) and fly (F), the relative rank of the signal strength of peaks within metapeaks increases with metapeak occupancy. (G,H) Both worm (G) and fly (H) targets of high-occupancy metapeaks show a predominance of high entropy genes (more uniform expression), while targets with lower occupancy metapeaks show lower entropy, indicating more cell-type-specific expression. The entropy of fly genes is shifted higher in the fly than in the worm.
Figure 2.
Figure 2.
Conservation of metapeak regions. Regions of the genomes spanned by metapeaks show increased conservation compared to random regions but less conservation than coding exons. The fly exons (A) are slightly less conserved than exon sequences in the worm (B). (C,D) Metapeaks with increasing numbers of peaks show decreasing conservation, particularly in the fly (D).
Figure 3.
Figure 3.
Correlation of TF–TF pairs. Peaks of some TF pairs occur in the same metapeaks more frequently than others as measured by Pearson's correlation for both worm L4/YA experiments (A) and fly embryo experiments (B). Two clusters of TFs with correlated peaks are highlighted for each species but others are also evident. Negative associations are evident on the left side of the worm plot.
Figure 4.
Figure 4.
Motif analysis in the metapeaks. (A,B) The number of peaks in a metapeak region was highly correlated with the motifs in the region for both fly (A) and worm (B). The Spearman's correlations for each are shown. Utilizing the ChIP-seq data to obtain motifs, we found using a more stringent peak cluster threshold of 53 for fly (C) and 31 for worm (D) resulted in a better motif inference success rate compared to using all peaks, the top 20% of peaks, or a larger cluster size. For each subset, we randomly sampled the same number of peaks from all peaks for better comparison (shown in red). This sampling process was repeated three times and the average value was shown.
Figure 5.
Figure 5.
Peak position relative to TSS. (A,B) Peaks in worms (A) and flies (B) predominantly lie close to the TSS of the nearest gene. Worm peaks are slightly farther from the TSS, possibly reflecting the use of splice leaders in worm transcripts, so that the actual start of transcription is further upstream. The hint of two different distributions may reflect those genes with a splice leader and those without. (C,D) Metapeaks with few peaks—less than or equal to six in the worm (C) and less than or equal to eight in the fly (D)—are more broadly distributed.
Figure 6.
Figure 6.
TF expression correlates with target gene expression. (A–D) Aggregate target expression reflects the TF expression for the worm blmp-1 (A) TF and (B) targets, and the fly GATAe (C) TF and (D) targets. Cell types are arranged along the x-axis by broad cell class and sorted alphabetically within each class. The worm cell types are further divided into time bins (from Supplemental Files S6, S7; Packer et al. 2019).
Figure 7.
Figure 7.
The relative importance of TFs in cell type gene expression. (A) A heatmap of the relative importance of worm TFs in predicting gene expression in the embryo terminal cell types. The importance is indicated by the intensity of the red color from yellow (no importance) to dark red (most important). Light blue indicates that the factor was not expressed above the threshold in that cell type. Clusters of TFs and cell types (black boxes) are blown up on the right, showing the detected relationships of well-studied and novel TFs and the cell types in which they are important. (B) Clusters of fly TFs and cell types in which they are important, selected from the full heatmap in Supplemental Figure S7. Color scale as in A. (C,D) TFs are important in the Ca lineage. (C) TFs are important in the Cap lineage, which produces exclusively body wall muscle cells and are arranged by the onset of their importance. (D) TFs are important in the Caa lineage or in patterned expression in both lineages. The Caa lineage produces primarily hypodermal cells but also some neurons and cell deaths. The lineage names are given on the right, along with the specific cell type in the embryo. Body wall muscle cells are labeled mu_bod followed by letters to indicate Dorsal/Ventral, L/R, and a number indicating the row of the cell, with 24 most posterior. Color scale as in A. Cell types not detected in the single-cell data are indicated by an X.

Update of

References

    1. Allen MA, Hillier LW, Waterston RH, Blumenthal T. 2011. A global analysis of C. elegans trans-splicing. Genome Res 21: 255–264. 10.1101/gr.113811.110 - DOI - PMC - PubMed
    1. Boyle AP, Araya CL, Brdlik C, Cayting P, Cheng C, Cheng Y, Gardner K, Hillier LW, Janette J, Jiang L, et al. 2014. Comparative analysis of regulatory information and circuits across distant species. Nature 512: 453–456. 10.1038/nature13668 - DOI - PMC - PubMed
    1. Brown JB, Boley N, Eisman R, May GE, Stoiber MH, Duff MO, Booth BW, Wen J, Park S, Suzuki AM, et al. 2014. Diversity and dynamics of the Drosophila transcriptome. Nature 512: 393–399. 10.1038/nature12962 - DOI - PMC - PubMed
    1. Calderon D, Blecher-Gonen R, Huang X, Secchia S, Kentro J, Daza RM, Martin B, Dulja A, Schaub C, Trapnell C, et al. 2022. The continuum of Drosophila embryonic development at single-cell resolution. Science 377: eabn5800. 10.1126/science.abn5800 - DOI - PMC - PubMed
    1. Cao J, Packer JS, Ramani V, Cusanovich DA, Huynh C, Daza R, Qiu X, Lee C, Furlan SN, Steemers FJ, et al. 2017. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 357: 661–667. 10.1126/science.aam8940 - DOI - PMC - PubMed

MeSH terms