Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 May 29:2025.05.29.656862.
doi: 10.1101/2025.05.29.656862.

Geometrically encoded positioning of introns, intergenic segments, and exons in the human genome

Affiliations

Geometrically encoded positioning of introns, intergenic segments, and exons in the human genome

Luay M Almassalha et al. bioRxiv. .

Abstract

Human tissues require a mechanism to generate durable, yet modifiable, transcriptional memories to sustain cell function across a lifetime. Previously, we demonstrated that nanoscale packing domains couple heterochromatin (cores) and euchromatin (outer zone) into unified reaction volumes that can generate transcriptional memory. In prior work, this framework demonstrated that RNA synthesis occurred within the ideal zone (intermediate density) portions of the domain. Naturally, this creates a question of where genes are positioned in relation to the packing domain architecture and which genetic material fills the domain core to sustain transcription. Here we propose that this could be solved by the encoded positioning of introns, intergenic segments, and exons as a projection of the functional packing layers of domains. This suggests that introns and intergenic segments are coupled to adjacent exons to generate coherent packing domain volumes. We illustrate how this organization would reconcile contradictions in epigenetic patterns, non-randomness in oncogenic mutations, and produce durable transcriptional memory. We conclude by showing that this genome geometry might have coincided with the rapid evolution of body-plan complexity, suggesting that chromatin geometry could be fundamental to metazoan evolution.

PubMed Disclaimer

Conflict of interest statement

Competing Interest Statement: The authors declare no financial interests or consulting interests related to this work.

Figures

Figure 1.
Figure 1.. A paradox of the loss of accessibility with transcriptional amplification.
A) DNAse-seq analysis demonstrates loss of accessibility with concurrent transcriptional activation during muscle differentiation. B-C) Loss of accessibility in non-exonic segments occurs across myogenic transcriptional activation. D) Transcriptional activity is associated with intronic heterochromatin deposition across tissue types. These are biallelic, constitutively expressed genes in their respective tissues. Critically, this includes TNNI3 (troponin), which is a component of the sarcomere necessary for cardiac contractility. E) Mass-fractal packing domains create a unified reaction-volume due to the continuous density gradient across domain layers. F) Transcription is non-monotonically dependent on local density due to the trade-off between entropic gain (remaining bound as an intermediate complex decreases the excluded volume to other macromolecules) and the diffusibility of reactant species. As density continues to increase, Pol II subunits can no longer penetrate deeper into a domain volume resulting in inhibition.
Figure 2.
Figure 2.. Packing domains as a geometric solution to optimize nuclear volume and transcriptional efficiency.
A) Loops have a broad range of sizes with many larger loops (>100Kbp) generating lengths that would span the human nucleus without a system for efficient packing. B) Proposed framework that the position of exons, introns, and intergenic elements produces a system to reliably generate reaction volumes. Exons with short intronic sequences fold into an ideal zone within a volume generated by NE DNA (the ideal zone as a surface-area to volume - SA/V – of the total volume). The resulting volumes represent the structures observed on ChromSTEM imaging. The continued selection for elements across broad-timescales results in an encoding within the genome. C-E) Transformation from beads on a string into mass-fractal volumes compresses genes from micron-length chains into nanoscopic volumes.
Figure 3.
Figure 3.. Introns and genes are geometrically linked to exon length by physical principles.
A) Histogram of protein coding genes in the human genome showing that the median fraction is ~9.8% exonic with a subset of genes that are almost completely exon. B) Plot of the ratio of exon (ideal zone)/intron (total volume) compared to the gene length of human genes. The constant, p, reflects the position along a domain volume. Here, n set to 1. As length increases, as expected, the volume ratio decreases as expected for a distribution of reaction volumes. C) Randomization of exon/intron segments results in the statistically grounded null hypothesis of no relationship between length and exon/intron ratios with a fraction approaching the median of 0.1. D) Intron length is a power-law of exon length for protein coding genes with values of p and γ as reported. E) Schematic representation of gene composition in relation to p and γ, indicating that high γ indicates more non-exonic volumetric elements are present within a segment. F) Relationship between γ and D depends on the proportion of the exons making up the ideal zone where β=1 indicates the entire exon contents are confined to a hard surface.
Figure 4.
Figure 4.. Exons are non-randomly coupled to adjacent volumetric DNA to generate power-law segments independent of the final RNA product.
A) Schematic representation of the organization of a gene on a chromosome segmented into two separate domains generated by a hinge element. The reaction volumes are produced by the volumetric DNA to guide the position of exons to ideal reaction zones. B) Analysis of the structure of human chromosomes in the positive strand orientation showing power-law assemblies of nontranscribed (volumetric) elements scaling as a power-law of exonic (ideal zone) elements. C) Randomly redistributing an exon with adjacent volumetric DNA conserves power-law distribution. In contrast, randomly distributing exons results in a linear distribution of segments. D-E) Comparison of organization generated by considering only (D) protein coding genes compared to (E) only non-protein coding genes in the positive strand orientation. In either case, chromosomes assemble into power-law units.
Figure 5.
Figure 5.. Exons, introns, and intergenic segments are non-randomly positioned into reaction volumes.
A) Comparison of the DNA content observed in ChromSTEM packing domains compared to analytical predictions of the model when hinge length is less than 300bp demonstrating similar distributions in content. B) Analysis of distance between active Pol II (PS-5) to the nearest core element (H3K9me3) as a function of Pol II Ps-5 density. Shallower depths and smaller domains would result in a higher concentration of Pol II Ps-5 per segment and would be expected to have a shorter distance to a core as is experimentally observed. C) Analysis of the ChIP-Seq content of heterochromatin (H3K9me3) and promoter associated euchromatin (H3K4me3) within theorized volumes from the model in HCT-116. Consistent with underlying theory, chromatin segments are composed of both elements with positioning H3K4me3 coinciding with position of exon elements while heterochromatin is positioned to deeper layers. This partitioning suggests that heterochromatin is coupled with euchromatin in genetic segments. D) ChIP-Seq analysis of active isoforms Pol II Ps-5 in HCT-116 cells and Pol II Ps-2 in Hep2G cells within gene bodies (exons and introns) demonstrating preferential localization of polymerase onto exons per basepair length. If polymerases were uniformly throughout a gene body equivalent coverage per basepair would be observed. As a control for read-coverage bias, this was compared to H3K9me3 (K9) which preferentially localizes to introns. E) Nascent RNA-seq analysis in HCT-116 demonstrates nearly uniform synthesis of short introns and exonic sequences independent of gene lengths. In contrast, longer genes demonstrate decreasing synthesis of RNA in the direction of the reading frame consistent with Pol II having ideal reaction positions. F) Analysis of hinge segments in HCT-116 demonstrates an enrichment toward transcriptionally active features and enhancer positions compared to randomly generated segments (R). G) In contrast, a very small percentage (<1%) of hinge positions overlaps with constitutive heterochromatin. H) Experimentally observed RNA polymerase loops plotted as a function of their NE content (approximately the volume generated) compared to the observed ChIP-Seq content within each loop in HCT-116 cells. Within large polymerase loop domains, we observe an accumulation of heterochromatin. The total heterochromatin content increases a function of the size suggesting packing occurring as a function of the volume generated. Small loops are primarily composed of euchromatin, indicating their volume would appear to be relatively decompacted. Collectively, this suggests that transcriptional loops have a degree of packing that correlates to the packing behavior for domain volumes observed on ChromSTEM imaging.
Figure 6.
Figure 6.. Power-law geometry produces a trade-off between modular durability and damage-risk.
A&B) Analysis of geometric properties of genes that are primarily unique to the esophagus, cerebellum, and muscle tissue demonstrating power-law organization. C&D) Stem-cell related transcription factors (Yamanaka factors, YF) generally organize as linear geometries. Similarly, HOX genes demonstrate two phenotypes: a cluster with linear organization (values of E/I >1) and a group that organizes into power-law distribution. Transcription factors such as RUNX2 that define tissue function are primarily organized as power-law geometries. E) Analysis of the frequence of protein-coding genes containing a hinge position demonstrating that the 200 most frequent Tier-1 oncogenes contain at least one hinge position. WG – whole genome compared Tier-1 oncogenes by their frequency: T50 – top 50 genes, T100 – top 100 genes, T200 – top 200 genes. F) Analysis of Tier 1 oncogene frequencies demonstrates an acceleration then plateau in frequencies as the likelihood of containing a hinge increases. Genes that are less likely to contain a hinge element had a lower correlation with oncogenic mutation frequency. This effect appears to plateau at mutation frequencies occurring over 1% of the time.
Figure 7.
Figure 7.. Packing geometry parallels body plan complexity in metazoans.
A&B) Analysis of chromosomal architecture (A) and genes (B) demonstrates that the S. cerevisiae genome is most likely organized as linear beads on a string assembly. As gene length increases, the non-exonic content increases linearly to create a chain. C&D) Analysis of chromosomal (C) and genes (D) demonstrating a transition toward power-law assemblies. In C. elegans, genes appear to be equally split between linear assemblies (E/I >1) and power-law assemblies. E-H) We observe a transformation with increasing body-plan complexity in both genes and chromosomes of D. rerio (E-F) and M. musculus (G-H) that their organizational structure resembles the structure of genes and chromosomes observed in humans.

Similar articles

References

    1. Hernández-Hernández J. M., García-González E. G., Brun C. E., Rudnicki M. A., Semin Cell Dev Biol 2017, 72, 10. - PMC - PubMed
    1. Gao T., Qian J., Nucleic Acids Res 2019. - PMC - PubMed
    1. Brown J. M., Roberts N. A., Graham B., Waithe D., Lagerholm C., Telenius J. M., De Ornellas S., Oudelaar A. M., Scott C., Szczerbal I., Babbs C., Kassouf M. T., Hughes J. R., Higgs D. R., Buckle V. J., Nat Commun 2018, 9, 3849. - PMC - PubMed
    1. Hsieh T.-H. S., Cattoglio C., Slobodyanyuk E., Hansen A. S., Darzacq X., Tjian R., Nat Genet 2022, 54, 1919. - PMC - PubMed
    1. Li W. S., Carter L. M., Almassalha L. M., Gong R., Pujadas-Liwag E. M., Kuo T., MacQuarrie K. L., Carignano M., Dunton C., Dravid V., Kanemaki M. T., Szleifer I., Backman V., Sci Adv 2025, 11. - PMC - PubMed

Publication types

LinkOut - more resources