Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 23;34(12):2304-2318.
doi: 10.1101/gr.279419.124.

An integrative TAD catalog in lymphoblastoid cell lines discloses the functional impact of deletions and insertions in human genomes

Affiliations

An integrative TAD catalog in lymphoblastoid cell lines discloses the functional impact of deletions and insertions in human genomes

Chong Li et al. Genome Res. .

Abstract

The human genome is packaged within a three-dimensional (3D) nucleus and organized into structural units known as compartments, topologically associating domains (TADs), and loops. TAD boundaries, separating adjacent TADs, have been found to be well conserved across mammalian species and more evolutionarily constrained than TADs themselves. Recent studies show that structural variants (SVs) can modify 3D genomes through the disruption of TADs, which play an essential role in insulating genes from outside regulatory elements' aberrant regulation. However, how SV affects the 3D genome structure and their association among different aspects of gene regulation and candidate cis-regulatory elements (cCREs) have rarely been studied systematically. Here, we assess the impact of SVs intersecting with TAD boundaries by developing an integrative Hi-C analysis pipeline, which enables the generation of an in-depth catalog of TADs and TAD boundaries in human lymphoblastoid cell lines (LCLs) to fill the gap of limited resources. Our catalog contains 18,865 TADs, including 4596 sub-TADs, with 185 SVs (TAD-SVs) that alter chromatin architecture. By leveraging the ENCODE registry of cCREs in humans, we determine that 34 of 185 TAD-SVs intersect with cCREs and observe significant enrichment of TAD-SVs within cCREs. This study provides a database of TADs and TAD-SVs in the human genome that will facilitate future investigations of the impact of SVs on chromatin structure and gene regulation in health and disease.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The step-by-step workflow to process raw Hi-C data into TADs/TAD boundaries and TAD–SVs in our Hi-C analysis pipeline. (A) The raw read files from 44 samples were used as input in Juicer for preprocessing and generating Hi-C maps, which were subsequently binned at multiple resolutions. The Insulation Score (IS) algorithm was applied to call an initial TAD boundary for each sample. All 44 Hi-C libraries were merged together to create a “mega” map and used as an input of Arrowhead (Rao et al. 2014) and IS (Crane et al. 2015) algorithms to call TADs, and TAD boundaries for the LCL merged call set. Finalized TAD boundary results for each individual were defined as those sample boundaries located within the merged boundary plus 10 kb flanking regions (the size of the exact TAD boundary called by IS for each individual) on the left side of the boundary start site and the right side of the boundary end site (Yu et al. 2017). The two figures located in the bottom left corner are shown as a comparison between the merged subject level and single subject level, which includes the Hi-C contact maps, the insulation scores, and the boundary strengths for the merged call set (5 kb) and the GM19036 (10 kb) sample over the region Chr 14: 35–35.8 Mb. (B) We examined the impact of SVs on chromatin structure by measuring the boundary score for each TAD boundary with the presence or absence of SVs. The Wilcoxon rank-sum test was employed to identify SVs significantly affecting TAD boundary strength, resulting in a set of TAD–SVs.
Figure 2.
Figure 2.
Visualization of one such region containing TADs identified in our Integrative Catalog but missing in the GM12878 released by ENCODE. From top to bottom, these plots show the Hi-C contact maps, the insulation scores, and the corresponding TAD boundaries with the boundary scores over this region. Green regions represent TADs identified by both GM12878 (ENCODE) and our Integrative Catalog, while orange regions highlight TADs identified by our pipeline but not in GM12878 (ENCODE).
Figure 3.
Figure 3.
Distribution of TAD boundaries in 27 samples of HGSVC2. The x-axis represents sample IDs, with the superpopulation ordered and represented by a different color, as displayed in the color key in the legend. The y-axis shows the number of TAD boundaries detected using our pipeline in the 27 samples. All of these TAD boundaries were called under a 10 kb resolution for each individual utilizing the IS method, owing to the relatively low sequencing depth and map resolution.
Figure 4.
Figure 4.
Visualization of two SVs that disrupt TAD boundaries with significant changes in boundary strength. (A,C) A deletion (Chr 8-644401-DEL-5014) that disrupts the TAD boundary and shows differences in Hi-C contact maps, BSs, and insulation scores for individuals with (genotype 1/1) and without (genotype 0/0) the deletion. The orange rectangle shows the TAD location for each sample within the plot region. The dark red rectangle represents the location of this deletion, and the yellow rectangle highlights the TAD boundary location and corresponding boundary strength. The top left figure is the GM19650 sample, whose genotype is 1/1, i.e., it carries this deletion, compared to the sample below, GM19036, whose genotype is 0/0, i.e., it does not have this deletion. The BS panel shows that the GM19650 sample lacks the TAD boundary where it carries that genomic deletion. (B,D) Similar comparisons for an example of an insertion (Chr 19-37789443-INS-1092) between the individual HG01114 (genotype 0/1) and HG03065 (genotype 0/0). The BS panel shows that the HG01114 sample has the TAD boundary where it carries that genomic insertion. (E,F) Boxplots demonstrating the significant differences in BSs for two genotype categories, 0 (genotype 0/0) and 1 (genotypes 0/1, 1/0, or 1/1).
Figure 5.
Figure 5.
Visualization of two SVs that disrupt TAD boundaries with significant changes in gene expression and splicing levels. (A) Boxplot demonstrates the significant difference in gene expression for two different genotype categories, 0 (genotype 0/0) and 1 (genotypes 0/1, 1/0, or 1/1). The boxplot on the left shows the significant changes between the associated ERICH1 gene expression values (after log transformation) and different genotypes of this deletion Chr 8-644401-DEL-5014. (C) Boxplot shows the significant difference between gene splice ratios (after log transformation and quantile normalization) for the splice junction clusters and genotypes of the same deletion, which is also associated with the ERICH1 gene. (B,D) The same comparisons for an example of insertion (Chr 1-45497763-INS-354) associated with the CCDC163 and AKR1A1 genes between genotype categories 0 and 1.

References

    1. 1000 Genomes Project Consortium. 2015. A global reference for human genetic variation. Nature 526: 68–74. 10.1038/nature15393 - DOI - PMC - PubMed
    1. Abdennur N, Mirny LA. 2020. Cooler: scalable storage for Hi-C data and other genomically labeled arrays. Bioinformatics 36: 311–316. 10.1093/bioinformatics/btz540 - DOI - PMC - PubMed
    1. Akdemir KC, Le VT, Chandran S, Li Y, Verhaak RG, Beroukhim R, Campbell PJ, Chin L, Dixon JR, Futreal PA, et al. 2020. Disruption of chromatin folding domains by somatic genomic rearrangements in human cancer. Nat Genet 52: 294–305. 10.1038/s41588-019-0564-y - DOI - PMC - PubMed
    1. Bailey TL, Johnson J, Grant CE, Noble WS. 2015. The MEME suite. Nucleic Acids Res 43: W39–W49. 10.1093/nar/gkv416 - DOI - PMC - PubMed
    1. Baran Y, Subramaniam M, Biton A, Tukiainen T, Tsang EK, Rivas MA, Pirinen M, Gutierrez-Arcelus M, Smith KS, Kukurba KR, et al. 2015. The landscape of genomic imprinting across diverse adult human tissues. Genome Res 25: 927–936. 10.1101/gr.192278.115 - DOI - PMC - PubMed

LinkOut - more resources