Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2018 Jan 4;8(1):219-229.
doi: 10.1534/g3.117.300296.

Comparison of ChIP-Seq Data and a Reference Motif Set for Human KRAB C2H2 Zinc Finger Proteins

Affiliations
Comparative Study

Comparison of ChIP-Seq Data and a Reference Motif Set for Human KRAB C2H2 Zinc Finger Proteins

Marjan Barazandeh et al. G3 (Bethesda). .

Abstract

KRAB C2H2 zinc finger proteins (KZNFs) are the largest and most diverse family of human transcription factors, likely due to diversifying selection driven by novel endogenous retroelements (EREs), but the vast majority lack binding motifs or functional data. Two recent studies analyzed a majority of the human KZNFs using either ChIP-seq (60 proteins) or ChIP-exo (221 proteins) in the same cell type (HEK293). The ChIP-exo paper did not describe binding motifs, however. Thirty-nine proteins are represented in both studies, enabling the systematic comparison of the data sets presented here. Typically, only a minority of peaks overlap, but the two studies nonetheless display significant similarity in ERE binding for 32/39, and yield highly similar DNA binding motifs for 23 and related motifs for 34 (MoSBAT similarity score >0.5 and >0.2, respectively). Thus, there is overall (albeit imperfect) agreement between the two studies. For the 242 proteins represented in at least one study, we selected a highest-confidence motif for each protein, utilizing several motif-derivation approaches, and evaluating motifs within and across data sets. Peaks for the majority (158) are enriched (96% with AUC >0.6 predicting peak vs. nonpeak) for a motif that is supported by the C2H2 "recognition code," consistent with intrinsic sequence specificity driving DNA binding in cells. An additional 63 yield motifs enriched in peaks, but not supported by the recognition code, which could reflect indirect binding. Altogether, these analyses validate both data sets, and provide a reference motif set with associated quality metrics.

Keywords: C2H2 recognition code; ChIP-seq; DNA-binding motif; KRAB C2H2 zinc finger proteins; endogenous retroelements.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the data analysis steps and methods utilized in this study.
Figure 2
Figure 2
Peak overlaps for the 39 shared KZNFs between the Trono original and Hughes data (dark blue bars), Hughes data and Trono original data (light blue bars), Trono reprocessed and Hughes data (dark red bars), and Hughes and Trono reprocessed data (light red bars).
Figure 3
Figure 3
Overview of the ERE enrichment in Hughes and Trono ChIP data. (A) Pearson Correlation between the 39 Hughes and Trono reprocessed overlapping KZNFs (matched pairs; red bars) and nonoverlapping KZNFs (unmatched pairs: 2964 comparisons; blue bars) and the frequency of the KZNF pairs at each given correlation. The arrow indicates the correlation beyond which 82% of the matched pairs and 8% of the unmatched pairs lie. The percentage of the peak overlap between the Hughes and Trono reprocessed (yellow dots) and Trono reprocessed and Hughes (green dots) at corresponding correlations are also presented. (B) Fraction of the top 500 overlapping KZNFs enriched in TEs (ERE instances and transposons). In total, 51 single TE instances were enriched with a fraction of >0.1. H, Hughes; O, Trono Original; R, Trono Reprocessed.
Figure 4
Figure 4
Similarity between ChIP-derived motifs. (A) Similarity between the Hughes and Trono motifs for ZIM3. The heat map on the left indicates the MoSBAT similarity e-scores between each pair compared. The motifs and the motif-finding methods are represented on the right. H, Hughes; TO, Trono Original; TR, Trono Reprocessed. (B) The MoSBAT e-scores between Hughes motifs and Trono original and Trono reprocessed motifs and the corresponding aligned motifs for the 39 overlapping KZNFs. H, Hughes; TO, Trono Original; TR, Trono Reprocessed; R, RCADE; M, MEME; a, all peaks; nE, nonERE peaks. (C) MoSBAT similarity e-scores for the 39 overlapping KZNFs between the Hughes data and Trono original (blue) and Trono reprocessed (red). The dots indicate the percentage of the overlap between the Hughes and Trono original peaks (blue) and Hughes and Trono reprocessed peaks (red).
Figure 5
Figure 5
AUROC of the Hughes and Trono motifs and external motifs overlapping any of the two data sets. Heat map represents the AUROC value of each motif tested on Hughes, Trono original, or Trono reprocessed peaks. The first row at the top indicates the source of the motif, and the second row indicates the test data set. TO, Trono Original; TR, Trono Reprocessed; H, Hughes. White indicates no data is available. A full version of the figure that includes the KZNFs IDs is available at the web portal of the paper (http://kznfmotifs.ccbr.utoronto.ca/figures.html).
Figure 6
Figure 6
The reference motif set for the 242 KZNFs. (A) Percentage and number of the motifs (in parentheses) fit into classes A–F and the median AUROC values of each group. (B) The reference motif for each of the 242 KZNFs. Source refers to motif origin (TO, Trono Original; TR, Trono Reprocessed; H, Hughes; Naj, ChIP-seq (Najafabadi et al. 2015b); SM, SMiLE-seq (Isakova et al. 2017); SelY, HT-SELEX (Yin et al. 2017); MSelY, Methyl-HT-SELEX (Yin et al. 2017); SelJ, HT-SELEX (Jolma et al. 2013); EN, ENCODE; Trans, TRANSFAC; HM, HocoMoco). The class is the selection class that each motif falls into. For ZIM2, ZNF445 and ZNF785, both motifs from class F are represented.
Figure 7
Figure 7
Web portal of ZNF549 containing all the analyses described (http://kznfmotifs.ccbr.utoronto.ca/report.php?name=ZNF549). (A) Motifs for the same KZNF derived from different sources. (B) MoSBAT similarity heat maps between all motifs. (C) Overlap between Hughes peaks and Trono reprocessed (left) and Trono original (right) peaks for all peaks and top 500 peaks. (D) ERE enrichment for the Hughes and Trono ChIP peaks.

Similar articles

Cited by

References

    1. Badis G., Berger M. F., Philippakis A. A., Talukder S., Gehrke A. R., et al. , 2009. Diversity and complexity in DNA recognition by transcription factors. Science 324: 1720–1723. - PMC - PubMed
    1. Bailey T. L., Boden M., Buske F. A., Frith M., Grant C. E., et al. , 2009. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37: W202–W208. - PMC - PubMed
    1. Brayer K. J., Segal D. J., 2008. Keep your fingers off my DNA: protein-protein interactions mediated by C2H2 zinc finger domains. Cell Biochem Biophys 50: 111–131. - PubMed
    1. Day D. S., Luquette L. J., Park P. J., Kharchenko P. V., 2010. Estimating enrichment of repetitive elements from high-throughput sequence data. Genome Biol. 11: R69. - PMC - PubMed
    1. Deplancke B., Alpern D., Gardeux V., 2016. The genetics of transcription factor DNA binding variation. Cell 166: 538–554. - PubMed

Publication types

Grants and funding

LinkOut - more resources