. 2020 Mar 1;36(6):1704-1711.

doi: 10.1093/bioinformatics/btz843.

An integrative approach for fine-mapping chromatin interactions

Artur Jaroszewicz^{1

2}, Jason Ernst^{1

2

3

4

5

6}

Affiliations

¹ Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA.
² Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA.
³ Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA 90095, USA.
⁴ Computer Science Department, University of California, Los Angeles, Los Angeles, CA 90095, USA.
⁵ Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA 90095, USA.
⁶ Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA.

PMID: 31742318
PMCID: PMC7425030
DOI: 10.1093/bioinformatics/btz843

An integrative approach for fine-mapping chromatin interactions

Artur Jaroszewicz et al. Bioinformatics. 2020.

. 2020 Mar 1;36(6):1704-1711.

doi: 10.1093/bioinformatics/btz843.

Authors

Artur Jaroszewicz^{1

2}, Jason Ernst^{1

2

3

4

5

6}

Affiliations

¹ Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA.
² Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA.
³ Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA 90095, USA.
⁴ Computer Science Department, University of California, Los Angeles, Los Angeles, CA 90095, USA.
⁵ Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA 90095, USA.
⁶ Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA.

PMID: 31742318
PMCID: PMC7425030
DOI: 10.1093/bioinformatics/btz843

Abstract

Motivation: Chromatin interactions play an important role in genome architecture and gene regulation. The Hi-C assay generates such interactions maps genome-wide, but at relatively low resolutions (e.g. 5-25 kb), which is substantially coarser than the resolution of transcription factor binding sites or open chromatin sites that are potential sources of such interactions.

Results: To predict the sources of Hi-C-identified interactions at a high resolution (e.g. 100 bp), we developed a computational method that integrates data from DNase-seq and ChIP-seq of TFs and histone marks. Our method, χ-CNN, uses this data to first train a convolutional neural network (CNN) to discriminate between called Hi-C interactions and non-interactions. χ-CNN then predicts the high-resolution source of each Hi-C interaction using a feature attribution method. We show these predictions recover original Hi-C peaks after extending them to be coarser. We also show χ-CNN predictions enrich for evolutionarily conserved bases, eQTLs and CTCF motifs, supporting their biological significance. χ-CNN provides an approach for analyzing important aspects of genome architecture and gene regulation at a higher resolution than previously possible.

Availability and implementation: χ-CNN software is available on GitHub (https://github.com/ernstlab/X-CNN).

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
The structure of the CNN in χ-CNN. A data matrix from an interacting locus is passed through an encoding layer, convolutional layer, global max-pooling layer, a dense layer and finally, a logistic regression layer. The encoder, convolutional and dense layers use a ReLu activation function

**Fig. 2.**
An example of a fine-mapped peak. The left and right sides correspond to the two sides of an interaction. The top images show tracks for H3K4me3, H3K27ac, H3K36me3, H3K9me3 and CTCF. The bottom images show χ-CNN’s fine-mapping score for each position in the region (in kilobases). There is a sharp peak on the left corresponding to a CTCF peak, and in the right region, χ-CNN assigns the highest importance score to one of three CTCF peaks

**Fig. 3.**
Distribution of fine-mapping predictions for different size HiCCUPs peaks. Kernel density estimation (KDE) plots showing the distribution of χ-CNN’s fine-mapping predictions within K562 peaks after extending the original peak equally in both directions to form a 25 kb peak. To generate plots, we used the ‘jointplot’ function with the KDE option in Python’s Seaborn package. (a) For 5 kb interaction peaks extended to 25 kb, fine-mapped positions are strongly concentrated around the original 5 kb peak (center blue box). Enrichment in center 5 kb bin is 8.3-fold compared to random guessing. (b) For 10 kb peaks extended to 25 kb, fine-mapped positions are concentrated in the original 10 kb peak (center blue box). Enrichment in center 5 kb bin is 4.2-fold. (c) Fine-mapped positions are not concentrated in any specific region in interactions called at 25 kb. Enrichment in center 5 kb bin is 1.6-fold. The positive direction on the axes points toward the exterior of the interactions. The mode of the 5 kb peak plot is shifted toward the positive direction, meaning that fine-mapped peaks are most likely to be approximately 1 kb further out than the center of the originally called peak. Similar plots for GM12878 can be found in Supplementary Figure S3. (Color version of this figure is available at *Bioinformatics* online.)

**Fig. 4.**
The 5 kb peak fine-mapping performance for χ-CNN and baseline methods. Fine-mapping performance using individual features is marked with points, and methods integrating multiple features are emphasized with horizontal bars. Light blue points and the dark blue bar correspond to ‘primary’ histone marks and χ-CNN trained on these marks, respectively. Similarly, light green points and the dark green bar correspond to ‘secondary’ marks. CTCF, in lavender, performs well, but χ-CNN trained on ‘secondary’ marks and CTCF performs better. Cohesion sub-units, in pink, are the best performing single marks; however, χ-CNN trained on all features, in red, shows greater enrichment than any individual mark. All other TFs, in orange, perform similarly to histone marks. Finally, a baseline method of averaging all features is marked with a brown bar. (Color version of this figure is available at *Bioinformatics* online.)

See this image and copyright information in PMC

Cited by

BART3D: inferring transcriptional regulators associated with differential chromatin interactions from Hi-C data.
Wang Z, Zhang Y, Zang C. Wang Z, et al. Bioinformatics. 2021 Sep 29;37(18):3075-3078. doi: 10.1093/bioinformatics/btab173. Bioinformatics. 2021. PMID: 33720325 Free PMC article.
Machine Learning in Epigenomics: Insights into Cancer Biology and Medicine.
Arslan E, Schulz J, Rai K. Arslan E, et al. Biochim Biophys Acta Rev Cancer. 2021 Dec;1876(2):188588. doi: 10.1016/j.bbcan.2021.188588. Epub 2021 Jul 7. Biochim Biophys Acta Rev Cancer. 2021. PMID: 34245839 Free PMC article. Review.
Integrative computational epigenomics to build data-driven gene regulation hypotheses.
Chen T, Tyagi S. Chen T, et al. Gigascience. 2020 Jun 1;9(6):giaa064. doi: 10.1093/gigascience/giaa064. Gigascience. 2020. PMID: 32543653 Free PMC article.
Machine and deep learning methods for predicting 3D genome organization.
Wall BPG, Nguyen M, Harrell JC, Dozmorov MG. Wall BPG, et al. ArXiv [Preprint]. 2024 Mar 4:arXiv:2403.03231v1. ArXiv. 2024. Update in: Methods Mol Biol. 2025;2856:357-400. doi: 10.1007/978-1-0716-4136-1_22. PMID: 38495565 Free PMC article. Updated. Preprint.
Multimodal deep learning for biomedical data fusion: a review.
Stahlschmidt SR, Ulfenborg B, Synnergren J. Stahlschmidt SR, et al. Brief Bioinform. 2022 Mar 10;23(2):bbab569. doi: 10.1093/bib/bbab569. Brief Bioinform. 2022. PMID: 35089332 Free PMC article. Review.

References

1. Abadi M. et al. (2016) TensorFlow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, Savannah, GA, USA, pp. 265–283.
1. Ay F. et al. (2014) Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts. Genome Res., 24, 999–1011. - PMC - PubMed
1. Ballard D.H. (1987) Modular learning in neural networks. In: AAAI Proceedings, Seattle, WA, USA, pp. 279–284.
1. Bergstra J., Bengio Y. (2012) Random search for hyper-parameter optimization. J. Mach. Learn. Res., 13, 281–305.
1. Bromley J. et al. (1994) Signature Verification Using a “Siamese” Time Delay Neural Network. American Telephone and Telegraph Company, Holmdell, NJ, USA, pp. 737–744.

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An integrative approach for fine-mapping chromatin interactions

Affiliations

An integrative approach for fine-mapping chromatin interactions

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources