Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 11;4(12):100698.
doi: 10.1016/j.xgen.2024.100698. Epub 2024 Nov 25.

Genome-wide chromosome architecture prediction reveals biophysical principles underlying gene structure

Affiliations

Genome-wide chromosome architecture prediction reveals biophysical principles underlying gene structure

Michael Chiang et al. Cell Genom. .

Abstract

Classical observations suggest a connection between 3D gene structure and function, but testing this hypothesis has been challenging due to technical limitations. To explore this, we developed epigenetic highly predictive heteromorphic polymer (e-HiP-HoP), a model based on genome organization principles to predict the 3D structure of human chromatin. We defined a new 3D structural unit, a "topos," which represents the regulatory landscape around gene promoters. Using GM12878 cells, we predicted the 3D structure of over 10,000 active gene topoi and stored them in the 3DGene database. Data mining revealed folding motifs and their link to Gene Ontology features. We computed a structural diversity score and identified influential nodes-chromatin sites that frequently interact with gene promoters, acting as key regulators. These nodes drive structural diversity and are tied to gene function. e-HiP-HoP provides a framework for modeling high-resolution chromatin structure and a mechanistic basis for chromatin contact networks that link 3D gene structure with function.

Keywords: chromatin; chromatin modeling; genome organization; mechanistic models; polymer physics.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
Mechanistic details of the e-HiP-HoP framework for simulating 3D chromatin structures (A) Key ingredients of the epigenetic HiP-HoP (e-HiP-HoP) framework. The chromatin fiber was modeled as a bead-and-spring polymer chain (endowed with specific properties dependent on local epigenetic marks), incorporating fundamental biophysical principles, including transcription factor (TF)-mediated bridging, loop extrusion, and variations in local chromatin fiber compaction. (B) Simulation snapshots of the emergent behaviors observed by e-HiP-HoP: chromatin-TF interactions result in microphase separation via the bridging-induced attraction, loop extrusion leads to chromatin domains, and transcriptionally active chromatin is more flexible and disrupted. (C) A workflow for the e-HiP-HoP framework. A chromosome fragment of interest was selected, and then the 1D epigenetic input data were collected, including ATAC-seq peaks and ChIP-seq peaks for H3K27ac, H3K27me3, and H3K9me3 (for defining TF-binding sites and locally open regions), as well as ChIP-seq peaks for RAD21 (a subunit of the cohesin complex) and CTCF (for defining loop anchors in loop extrusion). Each simulation iterated through 3 × 107 steps, and simulations were performed multiple times to determine independent structures akin to those from different cells within a population. (D) The simulation output is an ensemble of 600 3D structures of the chromosome fragment from which individual topoi, the 3D regulatory landscapes surrounding gene promoters, were abstracted. As an example, snapshots of HSA20 (human chromosome 20) are presented with enlarged views of an example gene topos, CD40. Black arrowheads point to the coding region of the gene (light purple). See also Figures S1 and S2.
Figure 2
Figure 2
Chromatin structures predicted by e-HiP-HoP simulations were validated by Hi-C, Micro-C, and FISH data (A) Comparison of contact maps between Hi-C, Micro-C, and e-HiP-HoP simulations for a 10 Mbp genomic region on chromosome HSA12. Top: comparison of the full region between Hi-C and simulations at 10 kbp resolution. Center: enlarged views of the 1 Mbp boxed regions. Bottom: comparison of the maps between Micro-C and simulations at 1 kbp resolution for the 200 kbp boxed regions marked in the center. The Pearson correlation coefficient r is reported for each comparison, and all correlations are statistically significant with p < 10−10. More comparison examples are shown in Figure S3. (B) Representative images of two-color FISH probes used to measure distances between selected chromosome fragments. Scale bars represent 10 μm, and the white dashed circle marks the inactive X chromosome. (C) Violin plots comparing the shape of the normalized experimental and simulated FISH distance distributions for probes in (B) and those located on Xq22.1. Here, to better reflect the local variability in chromatin compaction, we fitted the size of each chromatin bead σ locally for each probe region to experimental data to convert simulated distances into physical distances. The normalized separation is defined as the difference between the actual separation of the probes and its median, normalized by the local bead size. A two-sample Kolmogorov-Smirnov test was used to determine whether the distributions were statistically different (see STAR Methods and Figure S4 for more examples). See also Figures S3 and S4.
Figure 3
Figure 3
3DGene website provides a simple interface for browsing and downloading simulated structures predicted by e-HiP-HoP (A) Interface to the 3DGene database at https://3dgene.igc.ed.ac.uk. The database enables visualization of 10,742 active gene topoi in GM12878 human lymphoblastoid cells. (B) Genes can be searched using various identifiers, including their Ensembl ID and common name. (C) After searching for a gene using either a full or partial identifier, hits are returned with chromosome and gene coordinates (hg19). Selecting the “3D Model” link reveals an interactive results page. (D) The results page has three main elements: gene details, interactive structure viewing window, and contact matrix. From the gene details panel, one of the three most common 3D structures of the gene topos can be selected and examined in the viewing panel, and a hyperlink is provided to download the coordinates of the structure as a crystallographic information file (CIF). Each bead corresponds to 1 kbp of chromatin, and bead colors are explained in the enlarged snapshot on the right. (E) The 3D structure can be rotated and magnified or reduced using the mouse. To the right of the viewing window is the control panel (highlighted in light blue). Here, the aperture icon is used for capturing a screen grab, while the spanner reveals further visualization settings. Below the gene details panel is the contact matrix within the gene topos, constructed from all simulated structures.
Figure 4
Figure 4
Influential nodes are linked to gene function (A) A schematic illustrates how ATAC partners and influential nodes are determined for each gene promoter. In our notation, a topos is defined as the contiguous chromatin region encompassing all partners of its promoter, which essentially corresponds to the regulatory landscape. The box highlights the topos corresponding to the gene whose promoter is P. (B) Histograms showing the distributions of size, number of partners, and number of influential nodes for gene topoi genome wide. (C) The interaction frequency of each ATAC partner in the topoi of three genes with 14 partners: ZBTB5, TERF1, and PTPN22. These topoi have zero, one, and seven influential nodes, respectively (the dashed line marks the threshold for a partner to be considered as an influential node). The color scale indicates the distance between each partner and the promoter. (D) Examples of 3D structure for the topoi described in (C). (E and F) A list of the top ten Gene Ontology (GO) biological function terms, ranked by (E) −log10 of the false discovery rate (FDR) or (F) fold enrichment, from performing an overrepresentation test in GO terms comparing genes with more than one influential node to those with only a single node. Most of these GO terms are related to immune response; since we simulated lymphoblastoid cells, this result indicates that genes with a higher number of influential nodes are typically more tissue specific. The test was completed using the web tool Protein Analysis Through Evolutionary Relationship (PANTHER) and the GO database released on November 16, 2021.
Figure 5
Figure 5
Intra-topos interactions and influential nodes determine structural diversity (A) An illustration explaining the identification of networks in a gene topos and the associated diversity H score. Here, with a population of four networks, a higher H is achieved when the sampled structures are distributed more evenly among the networks (i.e., more equal slices in the pie chart), whereas a lower H occurs when many structures are associated with one of the networks (i.e., more unequal slices). (B) A scatterplot showing the number of networks against the number of partners for topoi genome wide. (C) A histogram showing the distribution of structural diversity for topoi genome wide. (D) Network pie chart and example structures for the three most frequent networks for SMARCA5, a low-diversity topos with eight partners and 56 networks. (E) Similar to (D) but for GINS4, a high-diversity topos with eight partners and 89 networks. (F) A 2D histogram showing the correlation between structural diversity and the percentage of ATAC partners being influential for genes with at least one influential node, and not all partners are influential (Spearman correlation coefficient r is reported). Circles indicate the positions of two example genes, TERF1 and PTPN22, with the former having a smaller proportion of influential partners (Figure 4C). (G) A plot showing the same correlation between diversity and the percentage of influential partners, binned according to the total number of partners of the genes. The color of each bar indicates the p value of the correlation.

References

    1. Kempfer R., Pombo A. Methods for mapping 3D chromosome architecture. Nat. Rev. Genet. 2020;21:207–226. doi: 10.1038/s41576-019-0195-2. - DOI - PubMed
    1. Schoenfelder S., Fraser P. Long-range enhancer-promoter contacts in gene expression control. Nat. Rev. Genet. 2019;20:437–455. doi: 10.1038/s41576-019-0128-0. - DOI - PubMed
    1. Su J.-H., Zheng P., Kinrot S.S., Bintu B., Zhuang X. Genome-Scale Imaging of the 3D Organization and Transcriptional Activity of Chromatin. Cell. 2020;182:1641–1659.e26. doi: 10.1016/j.cell.2020.07.032. - DOI - PMC - PubMed
    1. Takei Y., Yun J., Zheng S., Ollikainen N., Pierson N., White J., Shah S., Thomassie J., Suo S., Eng C.-H.L., et al. Integrated spatial genomics reveals global architecture of single nuclei. Nature. 2021;590:344–350. doi: 10.1038/s41586-020-03126-2. - DOI - PMC - PubMed
    1. Rappoport N., Chomsky E., Nagano T., Seibert C., Lubling Y., Baran Y., Lifshitz A., Leung W., Mukamel Z., Shamir R., et al. Single cell Hi-C identifies plastic chromosome conformations underlying the gastrulation enhancer landscape. Nat. Commun. 2023;14:3844. doi: 10.1038/s41467-023-39549-4. - DOI - PMC - PubMed

LinkOut - more resources