Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun;642(8069):1097-1105.
doi: 10.1038/s41586-025-08960-w. Epub 2025 May 7.

Chromatin loops are an ancestral hallmark of the animal regulatory genome

Affiliations

Chromatin loops are an ancestral hallmark of the animal regulatory genome

Iana V Kim et al. Nature. 2025 Jun.

Abstract

In bilaterian animals, gene regulation is shaped by a combination of linear and spatial regulatory information. Regulatory elements along the genome are integrated into gene regulatory landscapes through chromatin compartmentalization1,2, insulation of neighbouring genomic regions3,4 and chromatin looping that brings together distal cis-regulatory sequences5. However, the evolution of these regulatory features is unknown because the three-dimensional genome architecture of most animal lineages remains unexplored6,7. To trace the evolutionary origins of animal genome regulation, here we characterized the physical organization of the genome in non-bilaterian animals (sponges, ctenophores, placozoans and cnidarians)8,9 and their closest unicellular relatives (ichthyosporeans, filastereans and choanoflagellates)10 by combining high-resolution chromosome conformation capture11,12 with epigenomic marks and gene expression data. Our comparative analysis showed that chromatin looping is a conserved feature of genome architecture in ctenophores, placozoans and cnidarians. These sequence-determined distal contacts involve both promoter-enhancer and promoter-promoter interactions. By contrast, chromatin loops are absent in the unicellular relatives of animals. Our findings indicate that spatial genome regulation emerged early in animal evolution. This evolutionary innovation introduced regulatory complexity, ultimately facilitating the diversification of animal developmental programmes and cell type repertoires.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Chromatin architecture in early animal evolution.
a, Comparison of genomic features across metazoans and unicellular holozoans. For H. sapiens, we used previously published mCG methylation percentage data from H1 ESCs cells. Of note, although distal cis-regulatory elements (dCRE) were identified in Amphimedon queenslandica, their presence in E. muelleri had not been reported previously. mCG, CG methylation; TEs, transposable elements. b, Top left, phylogenetic tree showing the taxon sampling in this study, along with the number of profiled species per clade. Top right and below, Micro-C interaction maps of specific genomic regions (S. arctica, chr. 2: 3400000–3700000, bin 1 kb; C. owczarzaki, chr. 01: 3660000–3800000, bin 400 bp; S. rosetta, chr. 21: 800000–1100000, bin 800 bp; M. leidyi, chr. 8: 15500000–15700000, bin 400 bp; E. muelleri, Emue22: 2200000–2400000, bin 800 bp; T. adhaerens, TadhH1_4: 3880000–4180000, bin 800 bp; N. vectensis, NC_064040.1: 11650000–12000000, bin 1 kb; D. melanogaster, chr. 3L: 20480000–20820000, bin 800 bp; and H. sapiens, chr. 12: 69000000–71000000, bin 5 kb), showing examples of insulation boundaries or chromatin loops. All interaction maps were balanced using ICE normalization.
Fig. 2
Fig. 2. Chromatin compartments and insulation boundaries across species.
a, Saddle plots showing contact interactions between A and B compartments in each species, organized by eigenvector ranking. To obtain the distance-normalized matrix, the ratio of observed-over expected interactions is calculated, followed by eigenvector decomposition. The eigenvectors are oriented and sorted from the lowest (B compartment) to the highest (A compartment) values. The bins of the interaction matrix then reordered according to the rank of the eigenvector. The observed (O) and expected (E) values are averaged to create a saddle plot. The top 20% of the interaction values were used to calculate the compartment strength values shown on the saddle plots. Cowc, C. owczarzaki; Dmel, D. melanogaster; Emue, E. muelleri; Hsap, H. sapiens; Mlei, M. leidyi; Nvec, N. vectensis; Sarc, S. arctica; Sros, S. rosetta; Tadh, T. adhaerens. b, Compartment strength quantification at different relative resolutions. The barplot below shows the contribution of homotypical chromatin interactions within active (AA) and inactive (BB) chromatin states. c, Aggregate plots showing contact enrichment within a rescaled region between two insulation boundaries. The boundaries are identified using the sliding diamond window to detect the changes in contact frequencies in each genomic bin. To plot pile-ups, regions between insulation boundaries are rescaled and their normalized observed and expected contact frequencies are averaged. d, Insulation score distributions illustrating the degree of isolation between linear genomic neighbourhoods. Number of annotated strong boundaries is indicated in blue, with a vertical line representing the median value of each distribution. e, Classification of insulation boundaries using hierarchical assignment of structural and genomic features. f, Size distribution of annotated chromatin loops in each species. The boxplots show the median (centre line), 25th and 75th percentiles (box limits) and the whiskers show the range of variability, excluding outliers, which are shown as individual points. g, Annotation of chromatin loop anchors with promoter (P) and enhancer (E) signatures based on normalized H3K4me3 and H3K4me2 or H3K4me1 ChIP–seq coverage. Chromatin loop anchors with undefined (U) epigenetic signature are shown in grey.
Fig. 3
Fig. 3. Promoter hubs in placozoans.
a, Example of syntenic genomic regions in placozoans T. adhaerens (TadhH1_4: 3860000–4060000, bin 800 bp) and C. collaboinventa (chr. 4: 8983000–9183000, bin 800 bp). b, Gaudí plots projecting ATAC-seq, H3K4me3 and exon annotation signals onto a two-dimensional Kamada–Kawai graph layout (top left) represented by the top 20% of contact pairs with solid colours highlighting statistically significant regions (P < 0.05) identified using a one-sided permutation test. The high–high (HH) signal marks genomic bins enriched in signal and that are in spatial proximity with other bins enriched in signal; low–low (LL) bins are depleted in signal as well as neighbourhoods are depleted in signal; high–low (HL) and low–high (LH) are bins that are enriched in signal, but not their neighbourhood, and in reverse. c, Classification of T. adhaerens genes into three categories (GP1, GP2 and GP3) on the basis of structural and epigenetic features. Top, example regions containing genes classified into GP1, GP2 and GP3 groups. The resolution of Micro-C maps is 800 bp, maximum intensity value of ICE normalized Micro-C maps is as in a. Bottom, average loop strength between promoter regions of the genes from each groups is measured with APA. The colour bar of pile-up plots shows enrichment of observed over expected values. d, Sequence motif found in loop regions, which are also overlapping GP1 promoter regions (left panel), is present in promoter regions of orthologous GP1 genes in other placozoan species (right panel). The total number of shared orthologues is indicated. TrH2, Trichoplax sp. H2; Hhon, Hoilungia hongkongensis; HoiH23, C. collaboinventa. e, Heatmaps showing CPM normalized ATAC-seq and ChIP–seq coverage, motif scores and Mutator transposable element density ±5 kb around the TSS of GP1, GP2 and GP3 genes. Each heatmap scale starts at zero.
Fig. 4
Fig. 4. Chromatin loops in the ctenophore M. leidyi.
a, Example genomic region showing chromatin loops between promoters and enhancers at 400 bp resolution. b, Left, histogram of enhancer contacts per promoter. Right, genomic location of enhancers. c, Sequence motif enriched in loop anchors. d, DNA methylation profiles centred around motifs located at promoter and enhancer loop regions, or outside loops. e, Chromatin-bound proteome of M. leidyi, showing identified proteins sorted by abundance with architectural proteins CTEP1 and CTEP2 as the most abundant zf-C2H2s. f, DAP-seq signals around GC-motif sites with high (left) versus low (right) methylation levels, and sites located within (top) or outside (bottom) of loop anchors. CTEP1 showed higher affinity for unmethylated GC-rich motifs in DAP-seq assays with native or PCR amplified gDNA (lacking methylation). g, Boxplots showing PhastCons conservation scores across three ctenophore species (B. microptera, P. bachei and H. californensis). The boxplot limits indicate the interquartile range (IQR), with the median as the middle line and whiskers extending to 1.5× IQR. Two-sided Wilcoxon rank sum test showed significant conservation differences between intergenic enhancers (n = 969) and promoters in loops (n = 778) (***P = 1.3 × 10−15) and between promoters in loops and promoters outside loops (n = 14,996) (***P < 2.22 × 10−16), whereas intergenic enhancers and promoters outside loops showed no significant difference (not significant (NS), P = 0.88). h, Syntenic conservation within M. leidyi chromatin loops compared to H. californensis. Left plot, barplot showing the fraction of conserved orthologues (OGs) in all alignable genomic regions across ctenophore species (***P= 5.5 × 10−4, chi-squared test for given probabilities). Right plot, boxplot of shared orthologues between individual genomic regions within chromatin loops (n = 115) versus in random genomic regions (n = 259) of similar size (***P = 2.4 × 10−5, Wilcoxon rank sum test with continuity correction). Boxplot limits as in g. Silhouette of H. californensis in h reproduced from PhyloPic (https://www.phylopic.org/), created by S. Haddock and K. Wothe under a CC0 1.0 Universal Public Domain licence.
Fig. 5
Fig. 5. The evolution of animal regulatory genome architecture.
a, Phylogenetic tree illustrating the taxonomic distribution of 3D-chromatin features. b, Schematic depicting major innovations in animal genome regulation at different ancestral nodes. LCA, last common ancestor.
Extended Data Fig. 1
Extended Data Fig. 1. Micro-C experimental design and quality metrics.
a, Overview of the input material for Micro-C experiments, library preparation strategy, and sequencing statistics in each species. The D. melanogaster dataset was subsampled to match the coverage of generated Micro-C maps in this study. b, Top, fluorescence-activated cell sorting (FACS) profile of crosslinked phagocytic choanocytes from E. muelleri labelled by feeding sponges with fluorescent microspheres. Only cells positive for nuclei Hoechst 33342 staining together with fluorescent beads were sorted. The sorted cell population (P3) was selected using sequential gating strategy through P1 and P2. Right, fluorescent microscopy image of sorted choanocytes, where PH stands for phase contrast, nuclei are in blue, FluoSpheres beads are in green. Scale bar (top right corner) is 50 µm. Below, the sequential FACS gating strategy (P1 - P3) to sort mOrange::NvElav+ neuronal cells (P4) from the N. vectensis transgenic line. Wild type animals lacking the mOrange fluorescent protein were used to verify the gating strategy. The FACS sorting experiment data for E. muelleri and N. vectensis are representative of at least 6 independent experiments. c, The quality of chromatin digestion with MNase and followed proximity ligation was assessed with High sensitivity D5000 ScreenTapes using the Agilent 2200 TapeStation systems. The optimal chromatin fragmentation with Mnase results in up to 80% mononucleosomes profile. d, Barplots showing the percentage of reads mapped to the genome of each species. e, Barplots illustrating the distibution of intrachromosomal (cis) and interchromosomal (trans) interactions in each replicate experiment. The percentage of trans-contacts observed is species-specific but can be influenced by several factors: (i) the type of nuclear organization, such as Rabl-like configuration or the presence of chromosome territories, (ii) a high chromosome count, as seen in S. arctica and S. rosetta, (iii) and the reduction in nuclear diameter during the growth of coenocytes in S. arctica. f, Heatmap showing pairwise similarity scores between biological and technical replicates calculated as the stratum adjusted correlation coefficient (SCC). Below, SCC scores were estimated for a range of resolutions of 1 Kb, 2 Kb, 5 Kb, 10 Kb, 25 Kb, and 50 Kb. Differences in pairwise comparisons between experimental replicates are shown as mean ± s.d. The number of replicates per species as in (d-e). g, Top, cis-decay plots showing the rate of decay of contact frequency over genomic distance. The contact probability is averaged over all chromosomes. For C. owczarzaki, samples obtained from mitotic (blue) and synchronised G1/S stage (in orange) show different contact frequency behavior at short (below 10 Kb) and long (over 1 Mb) genomic distances. Bottom, log-derivative of cis-decay plots that predicts the folding of DNA into genomic structures and their size, most commonly chromatin loops, which tend to be the dominant micro-scale contacts. The first pronounced peak and dip at log-derivative cis-decay plots (highlighted in grey) in M. leidyi is observed at the scale from 10 Kb to 100 Kb. In H. sapiens, the peak size (highlighted in grey) ranges from 100 Kb to 1 Mb, which in both cases correspond to the average loop sizes in each species.
Extended Data Fig. 2
Extended Data Fig. 2. Genome sequencing, assembly and annotation.
a, Genome assembly strategy. b, Genome assembly statistics. BUSCO completeness score was calculated using genome mode or protein mode against metazoan BUSCO dataset for all species except unicellular holozoans, where eukaryotic dataset was used. c, Chromosome-level re-assembly of S. arctica, S. rosetta, T. adhaerens and C. collaboinventa genomes using Micro-C data resulting in total of 27, 36, 6 and 6 chromosomes, respectively. Both S. arctica and S. rosetta posses genome-wide telomere clustering, whereas placozoans display strong interchromosomal compartmentalization signal. d, Left, genome-wide Micro-C contact map showing the chromosome-level assembly of C. owczarzaki. C. owczarzaki exhibit increased interactions between telomeres and between centromeres, suggesting Rabl-like chromosome configuration. Right, chromosomal rearrangements in C. owczarzaki. Visual inspection of chromatin interaction maps revealed heterozygous deletions on C. owczarzaki chromosome 2, which is also confirmed by the uneven distribution of anti-H3 ChIP-seq coverage. In addition, one arm of the chromosome 13 exhibits genome-wide increase in the interaction frequency with other chromosomes, as well as two-fold coverage of H3 ChIP-seq, suggesting the gain of a chromosome arm pair. Finally, chromosome 15 v shares one arm with chromosome 15 and appears to be whole-arm translocation. e, Same as (d) for M. leidyi. The presence of uncollapsed haplotypes was estimated by distribution of per-base sequencing depth (left). Chromosomes in M. leidyi exhibit telomere clustering as well as increased intrachromosomal interactions similar to chromosome territories. f, Same as (d) for E. muelleri genome assembly with chromosome organisation similar to chromosome territories. g, Repeat content for each assembled genome, annotated using EDTA.
Extended Data Fig. 3
Extended Data Fig. 3. Genome compartmentalisation analysis.
a, Table translating relative resolutions of contact maps that were used to calculate compartmentalisation signal into base-pair resolutions. b, Example genomic regions showing eigenvector coefficients E1 and compartment annotation into A (active, red), B (inactive, blue) and I (intermediate, yellow). c, Density plots showing genome-wide distribution of E1 eigenvalues and the relative abundance of each defined compartments (stacked barplot on top). Compartments were defined using fitting of Gaussian mixture with three components (k = 3). A Bayesian Information Criterion (BIC) was computed for the specified mixture (bottom plot) (see Method section). d, Association between chromatin compartments and different genomic features calculated per genomic bin at a relative resolution of 20,000 bins per genome, as in (a). The proportion of features in each compartment category follows the classification as in (c). The boxplots indicate the relative signal (measured as genome-wide quantiles) of the different features in the genomic bins belonging to each compartment category (active, intermediate, inactive). The mean value of distributions is shown as the center line on the boxplots, with interquartile range (IQR) as the box limits and whiskers extending to 1.5x IQR.
Extended Data Fig. 4
Extended Data Fig. 4. Genomic insulation and chromatin loop analysis.
a, Insulation score profiles aligned at insulation boundary regions. Insulation score profiles were calculated for multiple resolutions with window sizes corresponding to 5x, 10x and 25x the chosen resolution. For example, for 400 bp resolution, we used window sizes of 2,000 bp (5*bin), 4,000 bp (10*bin), and 10,000 bp (25*bin). Parameters showing two strongest average insulation scores were considered optimal. For each of our studied species, an example contact map region with calculated insulation profile is shown. b, Left, overlap between regions annotated as strong boundaries using strategy described in this paper (see Method section) and previously published datasets,. c, Left, distribution of boundary strength values per species. Insulation boundaries (marked in blue) were selected using Li threshold as implemented in cooltools. Middle, distribution of linear distances (Kb) between successive boundaries, with the number of examined region between boundaries indicated. Right, boxplots showing the number of genes located between insulation boundaries (same number of examined regions as in the previous plot). Boxplots center line shows the median value, with box limits indicating the IQR and whiskers as 1.5x IQR. d, Epigenetic, structural and gene features associated to insulation boundaries in each species. Note that a boundary can be annotated with multiple features (e.g. TSS, ATAC and H3K4me3 peaks).
Extended Data Fig. 5
Extended Data Fig. 5. Genome architecture in unicellular holozoans.
a, Example genomic regions in S. arctica illustrating the co-segregation of inactive chromatin regions. The interacting regions, highlighted in grey, fold into chromatin domains that exhibit local compartmentalised interactions. b, The manually annotated 296 compartment domains have a median size of 18 Kb. Middle, the observed long-distance interactions within the domains display a local checkerboard pattern, where contacts are enriched within certain set of loci, while contacts between them are depleted. To quantify the contact distribution, we calculated the sum of ICE (iterative correction and eigenvector decomposition)-normalized contacts within the segregated regions and their flanking regions (30 Kb) at a resolution of 2,800 bp, across the size range of 50 Kb to 5 Mb. The contact interaction pattern observed over the silenced regions showed a reduced interaction frequency across the region body compared to flanking loci. This interaction pattern is typical for checkerboard compartmentalisation, in contrast to loop interactions, which manifest as local peaks in interaction frequencies. Right, genes located within the compartment domains are lowly expressed or silenced (*** p-value < 2.2e−16, Wilcoxon rank sum test). Boxplots center line shows the median value, with box limits indicating the IQR and whiskers as 1.5x IQR. c, Distribution of epigenetic signals across compartment domains. The regions within the annotated domains were located within the inactive B compartment and were enriched in transposable elements, predominantly Gypsy LTRs, which accounted for 63% of the total TEs in these regions. d, Size distribution of manually annotated 183 contact regions in S. rosetta that harbour lowly expressed genes (*** p-value = 4.8e−6, Wilcoxon rank sum test), boxplots as in (b). e, Example genomic regions in S. rosetta forming distal interactions. f, Same as (c) for S. rosetta. The interacting regions show weak enrichment in H3K4me1 and H3K27me3 signals compared to random genomic regions. g, Same as (e) for C. owczarzaki. h, Distal contacts in C. owczarzaki connect promoter regions of highly expressed genes (*** p-value = 5.3e−12, Wilcoxon rank sum test), boxplots as in (b). i, Distal contacts in C. owczarzaki are indicative of micro-compartmentalisation signal because of the characteristic alternating contact pattern and the decreased cumulative interactions in the promoter regions of the target genes compared to the concentration typically seen in chromatin loops annotated in T. adhaerens, M. leidyi, N. vectensis, D. melanogaster and H. sapiens. To quantify the distribution of contact interactions around TSS-TES sites, we calculated and compared the sum of ICE (iterative correction and eigenvector decomposition)-normalized contacts at species-specific resolutions (400 bp for C. owczarzaki, T. adhaerens and D. melanogaster, 500 bp for N. vectensis, 800 bp for M. leidyi and 5 Kb for H. sapiens). To eliminate confounding signals from distal compartmentalisation pattern or other long-distance interaction patterns, the sum of considered interactions was restricted to contacts that fall within the range size of annotated loops or interacting regions (4–100 Kb for C. owczarzaki, T. adhaerens, 5–250 Kb for D. melanogaster, 10–360 Kb for N. vectensis, 5–150 Kb for M. leidyi and 50–1,060 Kb for H. sapiens). To calculate the average distribution of interaction contacts around the TSS-TES sites we used the function stackup form the pybbi package version 0.4.0 (https://github.com/nvictus/pybbi). The TSS-TES regions were rescaled into 50 bins with flanking regions of 10 Kb for each species except H. sapiens with 100 Kb flanking regions. Additionally, we flipped the TSS-TES regions and their corresponding flanking regions for negative-stranded genes. Notice that in C. owczarzaki, the sum of interactions around the TSS was lower than average interactions within the gene body. This is due to a small-scale local checkerboard pattern, where regions between interaction loci showed low contact frequency. As a result, cumulative interactions at promoters were even lower than average background signal and signal over gene bodies. In contrast, in other examined species, including T. adhaerens, M. leidyi, N. vectensis, D. melanogaster and H. sapiens, where chromatin loops connected examined promoter regions to cis-regulatory elements, the contact frequency at loop anchor regions was enriched and higher than the average across gene bodies. These differences highlight distinct modes of chromatin organization of C. owczarzaki with other species. j, In C. owczarzaki, a subset of highly expressed genes (274) exhibit increased interaction frequencies between TSS and TES forming gene body interaction domains.
Extended Data Fig. 6
Extended Data Fig. 6. Genome architecture in the sponge Ephydatia muelleri.
a, Example E. muelleri genomic regions showing contact patterns perpendicular to the diagonal of the Micro-C matrix and visually resembling flares, jets, or fountains,. b, Aggregated contact strength around the midpoints of flare regions. Random genomic regions anchored at the TSS of expression-matched genes were used as a control. Boxplots center line shows the median value, with box limits indicating the IQR and whiskers as 1.5x IQR. c, Example E. muelleri genomic regions showing distal interactions connecting promoter and enhancer-like anchor regions. Unlike typical chromatin loops, the preferential contact interactions in E. muelleri are diffuse and do not form a conspicuous dot contact pattern. d, A total of 84 manually annotated focal contacts connecting distal regulatory elements were classified as enhancers or promoters based on their H3K4me3 to H3K4me1 ratio. e, Aggregate plots demonstrating contact enrichment within rescaled contact regions, compared to random genomic regions anchored at TSS of expression-matched genes on one side and distance-matched random points on the other side. Boxplot limits are as in (b). f, Non-promoter cis-regulatory elements were identified based on chromatin state, defined by low H3K4me3 and high H3K4me1 enrichment around regions of accessible chromatin. The plots illustrate the distribution of these elements and their proximity to the nearest transcription start site (TSS) or other contact anchors within loop-forming enhancers. Notice the distance-to-TSS distribution of E. muelleri enhancer-like elements is similar to that of enhancers that do not form stable loops in other species.
Extended Data Fig. 7
Extended Data Fig. 7. Genome architecture in the cnidarian Nematostella vectensis.
a, Example genomic region in N. vectensis showing chromatin loop contacts with loop anchors highlighted in grey. b, Loop anchor regions were classified as promoter-side if characterized by high H3K4me3 ChIP-seq signal levels and low H3K4me2 or low H3K4me1 signal. Enhancer-side loop anchors were defined as regions with low H3K4me3 and high H3K4me2 or H3K4me1. c, Most loop anchors retained their original classification, regardless of whether the H3K4me3/H3K4me2 or H3K4me3/H3K4me1 ratio was used. For N. vectensis, the ratio of H3K4me3/H3K4me1 outperformed H3K4me3/H3K4me2 in classifying loop anchors, as most of the disputed loops anchors annotated as promoters with H3K4me3/H3K4me2 were predominantly located in intronic and intergenic regions (pie chart). d, Left, aggregated contact strength of chromatin loop interactions, showing the overall intensity and frequency of chromatin contacts across loop anchor points. Right, loop anchors in N. vectensis show GTGT-motif enrichment (FC = 327, p-value = 1e−40) compared to GC-normalised background genomic regions. e, Genomic regions in N. vectensis displaying non-loop self-interacting domains. f, Same as (d), but for regions between insulation boundaries that also harbour self-interacting domains. Right, motif enrichment analysis was focused on accessible chromatin regions at the insulation boundaries. Accessible promoter regions in neuronal Elav+ cells were used as the background for comparison.
Extended Data Fig. 8
Extended Data Fig. 8. Placozoan genome architecture additional analyses.
a, Annotation of chromatin loop anchors with promoter and enhancer chromatin signatures for T. adhaerens and C. collaboinventa. Loop anchors annotated as enhancers were mostly located within promoter regions of other genes. To resolve this ambiguity, such loop anchors were classified as promoters based on their genomic context. b, Example contact map regions depicting promoter-enhancer distal interactions highlighted in grey in syntenic regions of placozoans. c, Local Moran’s Index scatterplot visualises assignment of genomic bins to four distinct groups: High-High (HH), where examined signal (ATAC or H3K4me3) spatially co-localises in a neighbourhood of other bins with high signal; Low-Low (LL) bin has low examined signal and located in a neighbourhood of bins with low signal; when bin and its neighbourhood have different levels of signal, then the bin is assigned to Low-High (LH) or High-Low (HL) quadrants. Statistically significant values are in solid colors. Right panel illustrates intensity of examined signal layered over the two-dimensional Kamada-Kawai representation of top 20% contact interactions. p-values and r-values (Pearson correlation coefficients) were determined using a one-sided permutation test. A linear least-squares regression was then performed between z-scores of ATAC or H3K4me3 values and the signal’s spatial lag. The 95% confidence interval of the regression is shown as a grey shadow. d, Boxplots showing relative gene expression (RNA-seq) and peak intensity (H3K4me3) at promoter regions of genes from GP1, GP2, and GP3 groups. For each pairwise comparison for both T. adhaerens (GP1: n = 2,978; GP2: n = 3,681; GP3; n = 3,851) and C. collaboinventa (GP1: n = 3,973; GP2: n = 3,119; GP3: n = 4,238), *** indicates p-values below 2.22e−16, two-sided Wilcoxon rank sum test. Boxplots center line shows the median value, with box limits indicating the IQR and whiskers as 1.5x IQR. e, Left, heatmaps showing CPM normalised ATAC-seq and ChIP-seq coverage, motif scores and Mutator transposable element density within 5 Kb of the TSSs of GP1, GP2, and GP3 genes in C. collaboinventa. Each heatmap scale starts at zero. Middle, aggregate peak analysis displaying the contact strength between gene promoters within each annotated group. Right: Genes in C. collaboinventa from various gene groups, classified based on the presence of chromatin loops and their epigenetic states, demonstrate overlap with orthologous genes from GP1, GP2, and GP3 in T. adhaerens. f, GO-term enrichment analysis of GP1 genes with p-values determined using Fisher’s exact test. g, Barplots showing the cell type (from previously published dataset) in which genes belonging to each group are maximally expressed. Only variable genes (with a fold-change higher than 1.8) are included. h, Scatterplot showing total gene expression (x-axis) versus gene expression variability (y-axis) across cell types. i, Distribution of motif scores in loop anchor regions compared to the genome-wide background. C. collaboinventa harbour similar motif to T. adhaerens (similarity score = 0.93) in 60% of annotated loop anchor regions. j, Local Moran’s Index scatterplot and Gaudí plots demonstrate spatial co-localisation of sequence motif identified in promoters of GP1 genes of T. adhaerens (motif score above 80th percentile). Statistically significant values are calculated as in (c). k, Schematic phylogenetic tree of TIR sequences of Mutator DNA transposons from four placozoan species (Trichoplax adhaerens, Trichoplax sp. H2, Hoilungia hongkongensis, Cladtertia collaboinventa). Placozoan Mutator DNA TIRs can be classified into 5 clades with consensus sequences. The similarity score between the TIR consensus sequence and the sequence motif in GP1 promoters is indicated. Pie charts shows the proportion of Mutator transposons harbouring the consensus TIR sequences.
Extended Data Fig. 9
Extended Data Fig. 9. Ctenophore genome architecture additional analyses.
a, Scatter plot showing the normalised H3K4me3 and H3K4me2 ChIP-seq coverage in 2 Kb region around loop anchor. b, Comparison of loop anchor annotation using either H3K4me3/H3K4me2 or H3K4me3/H3K4me1 ratios. For M. leidyi, H3K4me3/H3K4me2 ratio were more effective in annotating loop anchors, as many loop anchors classified as promoters using H3K4me3/H3K4me1 were found within intergenic or intronic regions (pie chart). The discrepancy is attributed to the high background noise observed in the H3K4me1 ChIP-seq signal. c, Normalised coverage for different chromatin features around loop anchors classified as promoters and enhancers. d, Genomic regions in H. californensis showing chromatin loops. In total, we annotated 239 chromatin loops, with 51% of loop anchors located within intronic or intergenic regions. High-resolution chromatin maps are expected to significantly increase the number of reported loops in H. californensis. e, Boxplots showing the total expression in scRNA-seq data for M. leidyi or RNA-seq data for C. californensis of genes with a loop anchor at their promoter regions, in their introns (enhancer sites), and genes not involved in distal chromatin interactions (outside loops). *** stands for p-value < 2.22e−16 of two-sided Wilcoxon rank sum test. Boxplots center line shows the median value, with box limits indicating the IQR and whiskers as 1.5x IQR. f, Motif score distributions at loop anchors (max score in 2,000 bp window around the center of a loop anchor) compared to genomic background. In H. californensis, we detected similar to M. leidyi GC-rich motif (similarity score = 0.96) enriched in 38% of loop anchors. g, Fraction of loop anchor sites containing the identified GC-rich motif at promoter sites (in orange), at enhancer sites (green) or at the promoters of genes not involved in chromatin loops (cyan). h, Scatterplot showing total gene expression (x-axis) versus gene expression variability (y-axis) across cell types, highlighting genes with their promoter involved in chromatin loops (orange) and also genes containing the GC-rich motif in their promoters but not involved in loops (cyan). These motif-containing genes without detected loops showed lower and more variable expression across cell types than genes with detected loops, suggesting the former could be forming loops in low-abundance cell types that we are unable to detect in bulk Micro-C experiments. i, DNA methylation levels at GC-motif sites located at chromatin loops (left) compared to methylation levels in motif occurrences outside detected chromatin loops (right). j, Bias-corrected ATAC footprint profiles centered around motifs located at loop anchors. k, Distribution of CTEP1 and CTEP2 bound DAP-seq peaks across genomic regions with varying DNA methylation levels and within annotated loop anchors. Below, the number of DAP-seq peaks containing the identified GC-rich motif. l, Number of loop anchor regions that contain CTEP1 and CTEP2 DAP-seq peaks. m, DAP-seq quantile normalized CPM coverage around GC-rich motif from CTEP2 binding assay using native genomic DNA fragments or unmethylated PCR amplified genomic DNA. CTEP2 as well as CTEP1 (Fig. 4f) exhibited higher affinity for the unmethylated GC-rich motif. n, Multiple sequence alignments of CTEP1 and CTEP2 genes were performed against the dataset of 358 metazoan genomes (Supplementary Table 3). The significant hits against CTEP proteins, exhibiting sequence identity above 50%, were found exclusively within ctenophores. o, Left, boxplots showing the number of transposable element insertions per promoter region of genes involved in chromatin loops compare to genes that are outside loops (*** indicates p-value < 2.22e−16, two-sided Wilcoxon rank sum test). Right, barplots showing the fraction of promoters in loops containing TE insertions compared to promoters not involved in loops and random genomic regions. Over 90% of promoter regions involved in distal interactions harbour insertion of DNA transposon. Additionally, promoters in loops have higher frequency of insertions of LTR and Unknown type transposons. Boxplot limits as in (e). p, Syntenic conservation within M. leidyi chromatin loops compared to Pleurobrachia bachei or Bolinopsis microptera. Left, barplot showing the fraction of conserved orthologs in all alignable genomic regions across ctenophore species (chi-squared test for given probabilities). Right, boxplot showing the fraction of shared orthologs between individual genomic regions within chromatin loops (P. bachei: n = 105; B. microptera: n = 332) versus in random genomic regions of similar size (P. bachei: n = 198; B. microptera: n = 945). p-value significance was calculated using two-sided Wilcoxon rank sum test. Boxplot limits as in (e). q, Number of predicted genes with zf-C2H2 protein domain in the different species studied included in this study.

References

    1. Lieberman-Aiden, E. et al. Comprehensive mapping of long range interactions reveals folding principles of the human genome. Science326, 289–293 (2009). - PMC - PubMed
    1. Sexton, T. et al. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell148, 458–472 (2012). - PubMed
    1. Beagan, J. A. & Phillips-Cremins, J. E. On the existence and functionality of topologically associating domains. Nat. Genet.52, 8–16 (2020). - PMC - PubMed
    1. Szabo, Q., Bantignies, F. & Cavalli, G. Principles of genome folding into topologically associating domains. Sci. Adv.5, eaaw1668 (2019). - PMC - PubMed
    1. Rao, S. S. P. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell159, 1665–1680 (2014). - PMC - PubMed

LinkOut - more resources