Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 30;14(1):31744.
doi: 10.1038/s41598-024-82238-5.

Predicting CTCF cell type active binding sites in human genome

Affiliations

Predicting CTCF cell type active binding sites in human genome

Lu Chai et al. Sci Rep. .

Abstract

The CCCTC-binding factor (CTCF) is pivotal in orchestrating diverse biological functions across the human genome, yet the mechanisms driving its cell type-active DNA binding affinity remain underexplored. Here, we collected ChIP-seq data from 67 cell lines in ENCODE, constructed a unique dataset of cell type-active CTCF binding sites (CBS), and trained convolutional neural networks (CNN) to dissect the patterns of CTCF binding activity. Our analysis reveals that transcription factors RAD21/SMC3 and chromatin accessibility are more predictive compared to sequence motifs and histone modifications. Integrating them together achieved AUPRC values consistently above 0.868, highlighting their utility in deciphering CTCF transcription factor binding dynamics. This study provides a deeper understanding of the regulatory functions of CTCF via machine learning framework.

Keywords: CTCF binding site; Chromatin accessibility; Convolutional neural networks; RAD21; SMC3.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
The overview of cell type-active CBSs prediction model. (A) The construction of the positive and negative sets of CTCF cell type-active binding site. (B) The feature extraction process, detailing both single and combined features used for model training. (C) The classifier is constructed based on motif and epigenetic signals applying the CNN methodology.
Fig. 2
Fig. 2
Chromatin accessibility signals is predictive for CBS. (A) Chromatin accessibility profiles (DNase-seq) in 21 cell lines, showing the distribution of DNase I hypersensitivity signals around CTCF binding sites (CBSs). (B) AUPRC values for predicting CBSs using DNase I signal features across 59 cell lines. The bar graph displays the performance of the predictive model, with the black line indicating the proportion of CTCF binding sites located within open chromatin regions.
Fig. 3
Fig. 3
Analysis of histone modification signals in CBS prediction. (A) Distribution of 12 histone modification signals in the K562 cell line. The profiles show the enrichment of histone modifications around CBSs. (B) Distribution of H2AFZ signals in 12 different cell lines. (C) AUPRC values for predicting CBSs using 12 histone modifications across 13 cell lines. The bar graph depicts the performance of the predictive model for each histone modification, demonstrating the effectiveness of different histone marks in identifying CBSs.
Fig. 4
Fig. 4
Analysis of RAD21 and SMC3 in CBS prediction. (A) The enrichment of RAD21 binding signals around CBSs in 9 cell lines. (B) Distribution of SMC3 binding signals in 4 cell lines. (C) The AUPRC values of prediction with the binding signal of RAD21 and SMC3.
Fig. 5
Fig. 5
Analysis of CTCF motif in CBS prediction. (A) The motifs sourced from JASPAR and HOCOMOCO. (B) The common motif derived from peaks shared across 67 cell lines (abbreviated as Com_motif). (C) The motifs derived from peaks unique to one cell line (abbreviated as Uni_motif). (D) The AUPRC values of prediction with the motif scores in 33 cell lines.
Fig. 6
Fig. 6
Accurately predicted peaks (N1, green), mis-predicted peaks (N2, blue), and inherent unpredictable peaks (N0, yellow) in 8 cell lines.

Similar articles

References

    1. Vostrov, A. A. & Quitschke, W. W. The zinc finger protein CTCF binds to the APBbeta domain of the amyloid beta-protein precursor promoter Evidence for a role in transcriptional activation. J. Biol. Chem.272, 33353–33359. 10.1074/jbc.272.52.33353 (1997). - PubMed
    1. Filippova, G. N. et al. An exceptionally conserved transcriptional repressor, CTCF, employs different combinations of zinc fingers to bind diverged promoter sequences of avian and mammalian c-myc oncogenes. Mol. Cell. Biol.16, 2802–2813. 10.1128/MCB.16.6.2802 (1996). - PMC - PubMed
    1. Dekker, J. & Mirny, L. The 3D genome as moderator of chromosomal communication. Cell164, 1110–1121. 10.1016/j.cell.2016.02.007 (2016). - PMC - PubMed
    1. Barrington, C. et al. Enhancer accessibility and CTCF occupancy underlie asymmetric TAD architecture and cell type specific genome topology. Nat. Commun.10, 2908. 10.1038/s41467-019-10725-9 (2019). - PMC - PubMed
    1. Guo, Y. et al. CRISPR inversion of CTCF sites alters genome topology and enhancer/promoter function. Cell162, 900–910. 10.1016/j.cell.2015.07.038 (2015). - PMC - PubMed

Publication types

LinkOut - more resources