Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Jan 20:2024.01.18.576242.
doi: 10.1101/2024.01.18.576242.

Binding profiles for 954 Drosophila and C. elegans transcription factors reveal tissue specific regulatory relationships

Affiliations

Binding profiles for 954 Drosophila and C. elegans transcription factors reveal tissue specific regulatory relationships

Michelle Kudron et al. bioRxiv. .

Update in

Abstract

A catalog of transcription factor (TF) binding sites in the genome is critical for deciphering regulatory relationships. Here we present the culmination of the modERN (model organism Encyclopedia of Regulatory Networks) consortium that systematically assayed TF binding events in vivo in two major model organisms, Drosophila melanogaster (fly) and Caenorhabditis elegans (worm). We describe key features of these datasets, comprising 604 TFs identifying 3.6M sites in the fly and 350 TFs identifying 0.9 M sites in the worm. Applying a machine learning model to these data identifies sets of TFs with a prominent role in promoting target gene expression in specific cell types. TF binding data are available through the ENCODE Data Coordinating Center and at https://epic.gs.washington.edu/modERNresource, which provides access to processed and summary data, as well as widgets to probe cell type-specific TF-target relationships. These data are a rich resource that should fuel investigations into TF function during development.

PubMed Disclaimer

Conflict of interest statement

DECLARATION OF INTERESTS KPW is associated with, and a shareholder in, Tempus Labs and Provaxus, Inc. All other authors do not have external interests.

Figures

Figure 1.
Figure 1.. Peaks and Metapeaks.
A, B Number of peaks per experiment (A) for the worm and (B) for the fly. Vertical lines denote the median number of peaks in worm (1,130) and fly (4,416). C, D The total number of peaks in metapeaks declines rapidly with increasing metapeak occupancy (the number of peaks in an individual metapeak) in both worm (C) and fly (D). The vertical bars indicate the thresholds used to define HOT (left) and ultra-HOT (right) sites in each species (84 and 240 peaks respectively in the worm and 277 and 602 peaks in the fly. E, F The relative rank of the signal strength of peaks within metapeaks increases with metapeak occupancy for worm (E) and fly (F). G, H Both worm (G) and fly (H) targets of high occupancy metapeaks show a predominance of high entropy genes (more uniform expression), while targets with lower occupancy metapeaks show lower entropy, indicating more cell type specific expression. The entropy of fly genes is shifted higher in the fly than the worm.
Figure 2.
Figure 2.. Conservation of metapeak regions
A, B. Regions of the genomes spanned by metapeaks show increased conservation compared to random regions but less conservation than coding exons. The fly exons are slightly less conserved than exon sequences in the worm. C, D. Metapeaks with increasing numbers of peaks show decreasing conservation, particularly in the fly.
Figure 3.
Figure 3.. Correlation of TF-TF pairs.
Peaks of some TF pairs occur in the same metapeaks more frequently than others as measured by Pearson correlation for both worm L4/young adult experiments (A) and fly experiments (B). Two clusters of TFs with correlated peaks are highlighted for each species but others are also evident. Negative associations are especially evident on the left side of the worm plot.
Figure 4.
Figure 4.. Motif analysis in the metapeaks.
A, B The number of peaks in a metapeak region was highly correlated with the motifs in the region for both fly (A) and worm (B). The Spearman correlations for each are shown. Utilizing the ChIP-seq data to obtain motifs, we found using a more stringent peak cluster threshold of 53 for fly and 31 for worm resulted in a better motif inference success rate compared to using all peaks, the top 20% of peaks or a larger cluster size. For each subset, we randomly sampled the same number of peaks from all peaks for better comparison (shown in red). This sampling process was repeated three times and the average value was shown.
Figure 5.
Figure 5.. Peak position relative to TSS.
A, B Peaks in worms (A) and flies (B) predominantly lie close to the TSS of the nearest gene. Worm peaks are slightly farther from the TSS, possibly reflecting the use of splice leaders in worm transcripts, so that the actual start of transcription is further upstream. The hint of two different distributions may reflect those genes with a splice leader and those without. C, D. Metapeaks with few peaks (less than or equal to 6 in the worm (B) and less than or equal to 8 in the fly (D) are more broadly distributed.
Figure 6.
Figure 6.. TF expression correlates with target gene expression.
Aggregate target expression reflects the TF expression for the worm blmp-1 (A TF, B targets) TF and the fly GATAe TF. (C TF, D targets). Cell types are arranged along the x axis by broad cell class and sorted alphabetically within each class. The worm cell types are further divided into time bins (from Packer et al., 2019, Table_S7).
Figure 7.
Figure 7.. The relative importance of TFs in cell type gene expression.
A. A heatmap of the relative importance of worm TFs in predicting gene expression in the embryo terminal cell types. The importance is indicated by the intensity of the red color from yellow (no importance) to dark red (most important). Light blue indicates that the factor was not expressed above threshold in that cell type. Clusters of TFs and cell types (black boxes) are blown up on the right, showing the detected relationships of well-studied and novel TFs and the cell types in which they are important. B. Clusters of fly TFs and cell types in which they are important, selected from the full heatmap in Supplemental Figure 5. Color scale as in A. C. D. TFs important in the Ca lineage. C. TFs important in the Cap lineage, which produces exclusively body wall muscle cells, arranged by onset of their importance. D. TFs important in the Caa lineage or in patterned expression in both lineages. The Caa lineage produces primarily hypodermal cells but also some neurons and cell deaths. The lineage names are given on the right, along with the specific cell type in the embryo. Body wall muscle cells are labeled mu_bod followed by letters to indicate Dorsal/Ventral, Left/Right and a number indicating the row of the cell, with 24 most posterior. Color scale as in A. Cell types not detected in the single cell data are indicated by an X.

References

    1. Allen Mary Ann, Hillier Ladeana W., Waterston Robert H., and Blumenthal Thomas. 2011. “A Global Analysis of C. Elegans Trans-Splicing.” Genome Research 21 (2): 255–64. - PMC - PubMed
    1. Araya Carlos L., Kawli Trupti, Kundaje Anshul, Jiang Lixia, Wu Beijing, Vafeados Dionne, Terrell Robert, et al. 2014. “Regulatory Analysis of the C. Elegans Genome with Spatiotemporal Resolution.” Nature 512 (7515): 400–405. - PMC - PubMed
    1. Bailey Timothy L. 2021. “STREME: Accurate and Versatile Sequence Motif Discovery.” Bioinformatics (Oxford, England) 37 (18): 2834–40. - PMC - PubMed
    1. Boyle Alan P., Araya Carlos L., Brdlik Cathleen, Cayting Philip, Cheng Chao, Cheng Yong, Gardner Kathryn, et al. 2014. “Comparative Analysis of Regulatory Information and Circuits across Distant Species.” Nature 512 (7515): 453–56. - PMC - PubMed
    1. Brenner S. 1974. “The Genetics of Caenorhabditis Elegans.” Genetics 77 (1): 71–94. - PMC - PubMed

Publication types