Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 30;20(1):148.
doi: 10.1186/s13059-019-1728-x.

Spatial chromatin architecture alteration by structural variations in human genomes at the population scale

Affiliations

Spatial chromatin architecture alteration by structural variations in human genomes at the population scale

Michal Sadowski et al. Genome Biol. .

Erratum in

Abstract

Background: The number of reported examples of chromatin architecture alterations involved in the regulation of gene transcription and in disease is increasing. However, no genome-wide testing has been performed to assess the abundance of these events and their importance relative to other factors affecting genome regulation. This is particularly interesting given that a vast majority of genetic variations identified in association studies are located outside coding sequences. This study attempts to address this lack by analyzing the impact on chromatin spatial organization of genetic variants identified in individuals from 26 human populations and in genome-wide association studies.

Results: We assess the tendency of structural variants to accumulate in spatially interacting genomic segments and design an algorithm to model chromatin conformational changes caused by structural variations. We show that differential gene transcription is closely linked to the variation in chromatin interaction networks mediated by RNA polymerase II. We also demonstrate that CTCF-mediated interactions are well conserved across populations, but enriched with disease-associated SNPs. Moreover, we find boundaries of topological domains as relatively frequent targets of duplications, which suggest that these duplications can be an important evolutionary mechanism of genome spatial organization.

Conclusions: This study assesses the critical impact of genetic variants on the higher-order organization of chromatin folding and provides insight into the mechanisms regulating gene transcription at the population scale, of which local arrangement of chromatin loops seems to be the most significant. It provides the first insight into the variability of the human 3D genome at the population scale.

Keywords: Biophysical modeling; CCCTC-binding factor; Chromatin architecture; Chromatin loops; Gene transcription; Genome regulation; Genomics; Human; RNA polymerase II; Topologically associating domains.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
3D human genome. Data aggregation and comparison. a Cumulative density plot showing the genomic span distribution of genomic structural elements identified in ChIA-PET and HI-C. b Comparison of Hi-C (lower left) and ChIA-PET (upper right) heatmaps of a 8.5-Mb genomic region in 10 kb resolution. Annotation for chromatin loops (cyan) and domains (purple) from ChIA-PET is presented on the heatmaps. The same ChIA-PET data is shown in a browser view together with ChIP-seq tracks for CTCF and cohesin subunits. The height of an arc indicates the strength of an interaction, which is measured by the number of clustered individual inter-ligation paired-end-tag products. Annotations of genes (GENCODE v12) and enhancers (ChromHMM) are presented. Arrows at genes mark the direction of their transcription. c Similar to b, but a 0.8-Mb genomic region is presented in a heatmap in 1 kb resolution and in a browser view
Fig. 2
Fig. 2
Predicting the impact of SVs on the chromatin topology. a Browser view of a 0.5-Mb genomic segment with asthma-associated SNP rs12936231 identified in a part of the human population. SNP rs12936231 alters the sequence of a CTCF motif involved in interactions. Haplotype-specific CTCF signals from 10 lymphoblastoid cells are presented along with haplotype-specific CTCF ChIA-PET interactions from GM12878 (only a subset of all interactions can be identified as specifically paternal/maternal as it is done based on allele-specific SNPs emerging at the interaction anchors). For each track, ChIP-seq signal values (originally in RPMs) were divided by the maximal value of the signal in the visualized region. Sum of the signal values over the genomic region occupied by the SNP-affected interaction anchor together with the genotype is marked in each signal track. b Comparison of sequences and scores of CTCF binding motifs carrying the reference C and the alternative G alleles of rs12936231. c Differences in gene transcription rates between genotypes set for rs12936231. Genes exhibiting differences in transcription which pass Mann-Whitney test with p value < 0.05 were reported. Center lines show the medians; box limits indicate the 25th and 75th percentiles; whiskers extend 1.5 times the interquartile range (IQR) from the 25th and 75th percentiles; outliers are represented by rings; far outliers (points beyond 3 times the IQR) are not represented by any element of box plots. n = 101, 227, 117 sample points. d CTCF anchors from GM12878 not intersected with CTCF ChIP-seq peaks identified in different lymphoblastoid cells. The anchors were filtered by consensus CTCF binding sites (see the “Methods” section). e Number of SVs, divided by type, intersecting (in case of interaction anchors), covering (in case of CCD boundaries), or contained in (in case of CCDs and CCD gaps) different genomic structural elements
Fig. 3
Fig. 3
Computational algorithm for modeling topological alterations caused by SVs. a Predicted impact of particular SV types on looping structure of the genome. Simplified chromatin looping patterns and 3D models are presented for the reference and its SV-altered versions. b Scheme presenting the chromatin modeling method at the level of loops. The method uses PET clusters, singletons, and orientations of CTCF binding motifs to accurately model the genome looping structures. c Browser view of a topological domain containing TAL1 gene and a deletion causing its activation. The deletion removes CTCF insulating the TAL1 promoter from enhancer E. CTCF and RNAPII ChIA-PET interactions are shown along with ChIP-seq tracks for CTCF, cohesin subunits (SMC3 and RAD21), and H3K27ac which marks the enhancer E. d Models presenting 3D structure of the TAL1 locus without the deletion (left column) and with the deletion (right column). Schematic drawings of loops shown in c (first row); 3D models with loops colored as on schematic drawings (second row); 3D models with TAL1 and enhancer E marked (third row). e Distance in 3D Euclidian space between the TAL1 promoter and enhancer E and mean distance between the promoter and enhancers located in the same CCD. In green, distribution of distances calculated in 3D models of the reference structure (REF), in purple—in models with the deletion introduced (DEL). For each case, 100 models were generated. The differences between REF and DEL groups are statistically significant (p values much less than 0.001), see Fig. 2c for box plot description. f 3D model of the TAL1 locus including RNAPII-mediated chromatin interactions
Fig. 4
Fig. 4
Impact of SVs on genome organization at the population scale. a Browser view of a 1-Mb genomic segment with a deletion identified in a part of the human population. The deletion removes a CTCF anchor with enhancer located in an intron of KIAA0391. CTCF ChIP-seq signals from 10 lymphoblastoid cells of different genotypes are presented for comparison. For each track, ChIP-seq signal values (originally in RPMs) were divided by the maximal value of the signal in the visualized region. The highest signal peak in the genomic region covered by the deletion is marked in each signal track. b Close-up on ChIA-PET interactions at the deletion site displayed above the ChIP-seq profiles of histone modifications for GM12878—no deletion and GM18526—homozygous deletion. H3K4me1 is primarily associated with active enhancers, H3K27ac—with active promoters and enhancers, H3K4me3—with promoters. Compare with Additional file 2: Figure S7. c Differences in gene transcription rates between genotypes defined by the deletion. Genes exhibiting the differences in transcription which pass Mann-Whitney test with p value < 0.1 were reported, see Fig. 2c for box plot description. n = 346, 85, 14 sample points. d 3D models of the domain shown in a without the deletion (left column) and with the deletion (right column). Schematic drawings of loops shown in b (first row); 3D models with loops colored as on schematic drawings (second row); 3D models with NFKBIA and PPP2R3C genes (arrows are pointing toward the TSSs) and enhancers marked (third row). Every picture has its duplicated zooming in on the deletion site. e Distance in 3D Euclidean space between the NFKBIA promoter and enhancer E1 and between the PPP2R3C promoter and enhancer E2. In green, distribution of distances calculated in 3D models of the reference structure (REF), in purple—in models with the deletion introduced (DEL). For each case, 100 models were generated. The differences between REF and DEL groups are statistically significant (p values much less than 0.001), see Fig. 2c for box plot description. f Enrichment/depletion of genomic structural elements with SVs of different types and of different VAF (VAF < 0.001 and VAF ≥ 0.001). In case of CCD borders, only these fully imbedded in SV intervals are counted as affected, whereas for other structural elements ≥ 1 bp overlaps are counted. Error bars represent SD. g Enrichment/depletion of genomic structural elements with the 1000 Genomes Project SNPs (ALL 1kGP), all GWAS SNPs (ALL GWAS), GWAS SNPs associated with hematological parameters (HP), and with autoimmune diseases (AI). Error bars represent SD
Fig. 5
Fig. 5
Impact of population-specific structural variants on genome organization. a Number of CTCF anchors intersected by SVs of a given type identified in individuals from 5 continental groups, see Fig. 2c for box plot description. b Number of domain borders fully overlapped by SVs of a given type identified in individuals from 5 continental groups. c Enrichment/depletion of CTCF anchors and CCD boundaries with SVs divided by continental groups. CTCF motifs at CCD borders and outside CCD borders are shown for comparison. Only SVs fully covering motifs are counted as hits. d Number of gene promoters in domains covering regions in which SVs are identified. e CCD topology variability patterns by continental groups. f Number of CTCF anchors intersected by SVs of a given type identified in individuals from South Asian continental group. g Number of domain borders fully overlapped by SVs of a given type identified in individuals from South Asian continental group. h Number of homozygous SVs in individual human genomes by population. CNVs are treated as homozygous when the number of copies on both homologous chromosomes is different than in the reference (hom., homozygous). i Number of CTCF anchors intersected by homozygous SVs in individual genomes by population. j Number of CCDs containing human knockouts with CTCF (purple) or RNAPII (cyan) anchors intersected by homozygous population-specific SVs. k Homozygous SVs identified in a single human population
Fig. 6
Fig. 6
Role of chromatin rearrangements in the regulation of gene transcription. a Table summarizing identified eQTLs and their intersections with interaction anchors. b Density plot showing genomic span distribution of PET clusters. d is the value (17,800 bp) by which eQTLs were split into proximal and distal. c Venn diagram showing the number of proximal (Prox) and distal eQTLs. d Enrichment/depletion of genomic elements with eQTLs. Error bars represent SD. e Enrichment/depletion of genomic elements with eQTLs of housekeeping genes. Error bars represent SD. f Abundance of gene promoters in CCDs, in which eQTLs were identified, see Fig. 2c for box plot description. n = 16 (CCDs with eQTLs in CTCF loops), 32 (CCDs with eQTLs in RNAPII loops), and 106 (CCDs with eQTLs outside loops) sample points. g Distributions of chromatin loop density in CCDs in which eQTLs were identified and in other CCDs. The density is measured for a particular CCD as an average number of CTCF-/RNAPII-mediated chromatin loops covering a 1-Mb fragment of this CCD. Differences between the groups are significant (p values < 0.001), see Fig. 2c for box plot description. n = 2125 (CCDs without eQTLs) and 142 (CCDs with eQTLs) sample points. h Linkage disequilibrium (measured as r2 value in the CEU population) between deletions shown in i and j. Colors are assigned to the deletions as in ik. i Browser view of a 0.4-Mb genomic segment with 5 deletions identified in a part of the human population, which disrupt RNAPII anchors and are eQTLs for 6 neighboring genes (signed with the red font). Each deletion has its color. RNAPII ChIP-seq signals from 6 lymphoblastoid cells of different genotypes are presented for comparison. For each track, normalized ChIP-seq signal values were divided by the maximal value of the signal in the visualized region. Sum of the signal values over the genomic regions occupied by the deletions is marked in each signal track. H3K27ac, H3K4me3, H3K4me1, and DNase-seq signal tracks from GM12878 are shown. j Close-up on the RNAPII-mediated interactions affected by 4 of the 5 deletions. Only the loops affected by the deletions are shown for clarity. k Genes which transcription is correlated with one or more of the deletions shown in i and j (p value < 0.001). Boxes with transcription rates associated with a particular deletion are marked with the color assigned to the deletion, as in i and j, see Fig. 2c for box plot description. n = 132, 167, 146 (DEL 1); 91, 208, 146 (DEL 2); 91, 208, 146 (DEL 3); 257, 158, 30 (DEL 4); 265, 154, 26 (DEL 5) sample points. l Signal strength of histone marks and DNase hypersensitivity sites in interaction anchors intersected with proximal eQTLs, distal eQTLs, and not intersected by eQTLs. For each mark, two plots are presented. A signal track around anchor center (± 2 kb) showing values for each genomic position averaged over all anchors from a given group (top). A box plot showing mean signal values in the same regions (bottom). Original signal values represent fold change over control. CTCF and RNAPII anchors were analyzed jointly, see Fig. 2c for box plot description. n = 1000 (anchors no eQTLs), 523 (anchors distal eQTLs), and 242 (anchors prox eQTLs) sample points

References

    1. Malhotra D, Sebat J. CNVs: harbingers of a rare variant revolution in psychiatric genetics. Cell. 2012;148:1223–1241. doi: 10.1016/j.cell.2012.02.039. - DOI - PMC - PubMed
    1. Stankiewicz P, Lupski J. Structural variation in the human genome and its role in disease. Annu Rev Med. 2010;61:437–455. doi: 10.1146/annurev-med-100708-204735. - DOI - PubMed
    1. Zollino M, Orteschi D, Murdolo M, Lattante S, Battaglia D, Stefanini C, Mercuri E, Chiurazzi P, Neri G, Marangi G. Mutations in KANSL1 cause the 17q21.31 microdeletion syndrome phenotype. Nat Genet. 2012;44:636–638. doi: 10.1038/ng.2257. - DOI - PubMed
    1. Talkowski M, Mullegama S, Rosenfeld J, van Bon W, Shen Y, Repnikova E, Gastier-Foster J, Thrush D, Kathiresan S, Ruderfer D, et al. Assessment of 2q23.1 microdeletion syndrome implicates MBD5 as a single causal locus of intellectual disability, epilepsy, and autism spectrum disorder. Am J Hum Genet. 2011;89:551–563. doi: 10.1016/j.ajhg.2011.09.011. - DOI - PMC - PubMed
    1. Maurano M, Humbert R, Rynes E, Thurman R, Haugen E, Wang H, Reynolds A, Sandstrom R, Qu H, Brody J, et al. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012;337:1190–1195. doi: 10.1126/science.1222794. - DOI - PMC - PubMed

Publication types