Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 30;13(4):621.
doi: 10.3390/genes13040621.

InsuLock: A Weakly Supervised Learning Approach for Accurate Insulator Prediction, and Variant Impact Quantification

Affiliations

InsuLock: A Weakly Supervised Learning Approach for Accurate Insulator Prediction, and Variant Impact Quantification

Shushrruth Sai Srinivasan et al. Genes (Basel). .

Abstract

Mapping chromatin insulator loops is crucial to investigating genome evolution, elucidating critical biological functions, and ultimately quantifying variant impact in diseases. However, chromatin conformation profiling assays are usually expensive, time-consuming, and may report fuzzy insulator annotations with low resolution. Therefore, we propose a weakly supervised deep learning method, InsuLock, to address these challenges. Specifically, InsuLock first utilizes a Siamese neural network to predict the existence of insulators within a given region (up to 2000 bp). Then, it uses an object detection module for precise insulator boundary localization via gradient-weighted class activation mapping (~40 bp resolution). Finally, it quantifies variant impacts by comparing the insulator score differences between the wild-type and mutant alleles. We applied InsuLock on various bulk and single-cell datasets for performance testing and benchmarking. We showed that it outperformed existing methods with an AUROC of ~0.96 and condensed insulator annotations to ~2.5% of their original size while still demonstrating higher conservation scores and better motif enrichments. Finally, we utilized InsuLock to make cell-type-specific variant impacts from brain scATAC-seq data and identified a schizophrenia GWAS variant disrupting an insulator loop proximal to a known risk gene, indicating a possible new mechanism of action for the disease.

Keywords: 3D chromatin structure; CTCF mediated insulator loops; brain disorders; deep learning; gene regulation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Overall schematic workflow of the InsuLock Method: First, the positive dataset is generated by intersecting Rad21 ChIA-PET peaks with CTCF ChIP-seq peaks to define true anchor regions. InsuLock utilizes one-hot encoded DNA matrices as input to make predictions. Second, the binary-classification module learns the sequence patterns around the anchors of insulator loops and makes a binary prediction. Third, the weakly-supervised object detection module utilizes the activation maps (Ak), weighted by an importance score (ac) to refine and localize the core anchor region and visualize the sequence features that were used by the model to make the prediction. Fourth, the variant impact quantification module identifies insulator disrupting variants by computing a delta score.
Figure 2
Figure 2
InsuLock’s Siamese neural network architecture: To the binary classification model, the one-hot encoded matrix of both the forward and reverse complementary sequences are fed, and the sigmoid predictions of both the strands are averaged out to provide the final anchor prediction. The neural network architecture is composed of 3 convolution blocks, where each block contains a convolution and a pooling layer, which is followed by 3 dense layers.
Figure 3
Figure 3
InsuLock’s performance on the test data indicating the performance metrics against different types of non-anchors using (A) ROC curve and (B) PR curve. Cross cell line validation performance against all the 4 cell lines using (C) AUROC and (D) AUPR.
Figure 4
Figure 4
Benchmarking reverse-complement tackling strategies used in deep learning models in genomics on (A) different types of non-anchors and (B) different cell-lines using AUPR, indicating the superior performance of conjoined post-hoc models compared to other methods.
Figure 5
Figure 5
(A) Total nucleotide coverage of the original insulator annotations compared with InsuLock’s refined insulator annotations. (B) Conservation analyses using PhastCons score indicate that the refined regions are significantly conserved compared to original regions.
Figure 6
Figure 6
Total nucleotide InsuLock’s variant impact quantification module that predicts the effect of SNVs on the insulator sites. (A) Brain scATAC-seq, GWAS, HiChIP, NCBI RefSeq, 100 vertebrates PhyloP conservation, CTCF ChIP-seq peak in astrocytes, and bipolar neuron track on UCSC genome browser. (B) Predicted anchor probability of wild-type and mutant sequence. (C) Delta scores of GWAS SNPs associated with Schizophrenia.

References

    1. Kadauke S., Blobel G.A. Chromatin loops in gene regulation. Biochim. Biophys. Acta. 2009;1789:17–25. doi: 10.1016/j.bbagrm.2008.07.002. - DOI - PMC - PubMed
    1. Razin S.V., Ulianov S.V. Gene functioning and storage within a folded genome. Cell. Mol. Biol. Lett. 2017;22:18. doi: 10.1186/s11658-017-0050-4. - DOI - PMC - PubMed
    1. Zheng H., Xie W. The role of 3D genome organization in development and cell differentiation. Nat. Rev. Mol. Cell Biol. 2019;20:535–550. doi: 10.1038/s41580-019-0132-4. - DOI - PubMed
    1. Fullwood M.J., Ruan Y. ChIP-based methods for the identification of long-range chromatin interactions. J. Cell. Biochem. 2009;107:30–39. doi: 10.1002/jcb.22116. - DOI - PMC - PubMed
    1. Lieberman-Aiden E., van Berkum N.L., Williams L., Imakaev M., Ragoczy T., Telling A., Amit I., Lajoie B.R., Sabo P.J., Dorschner M.O., et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–293. doi: 10.1126/science.1181369. - DOI - PMC - PubMed

Publication types