Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 22;51(3):1103-1119.
doi: 10.1093/nar/gkac1258.

Hypothesis-driven probabilistic modelling enables a principled perspective of genomic compartments

Affiliations

Hypothesis-driven probabilistic modelling enables a principled perspective of genomic compartments

Hagai Kariti et al. Nucleic Acids Res. .

Abstract

The Hi-C method has revolutionized the study of genome organization, yet interpretation of Hi-C interaction frequency maps remains a major challenge. Genomic compartments are a checkered Hi-C interaction pattern suggested to represent the partitioning of the genome into two self-interacting states associated with active and inactive chromatin. Based on a few elementary mechanistic assumptions, we derive a generative probabilistic model of genomic compartments, called deGeco. Testing our model, we find it can explain observed Hi-C interaction maps in a highly robust manner, allowing accurate inference of interaction probability maps from extremely sparse data without any training of parameters. Taking advantage of the interpretability of the model parameters, we then test hypotheses regarding the nature of genomic compartments. We find clear evidence of multiple states, and that these states self-interact with different affinities. We also find that the interaction rules of chromatin states differ considerably within and between chromosomes. Inspecting the molecular underpinnings of a four-state model, we show that a simple classifier can use histone marks to predict the underlying states with 87% accuracy. Finally, we observe instances of mixed-state loci and analyze these loci in single-cell Hi-C maps, finding that mixing of states occurs mainly at the cell level.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of probabilistic model of genomic compartments. We assume that the Hi-C interaction frequency matrix is sampled from an underlying interaction probability matrix formula image. Interaction probabilities formula image result from two components: a distance-based interaction probability function formula image and a state-based interaction probability component representing the genomic compartment signal. In the state-based interaction probability of two loci depends on the probability of each locus to be in each of the states (represented by matrix formula image) and the affinities of these states to each other (represented by matrix formula image). We show that the state-based interaction probability component is equivalent to a multiplication of these matrices.
Figure 2.
Figure 2.
Two-state intrachromosomal (cis) model performance. The model was fitted to GM12878 Hi-C by Rao et al. (45) at 50kb resolution. (A) Distance-normalized Spearman correlation between the Hi-C interaction frequency matrix and the model's inferred interaction probability matrix. The optimal possible correlation for the model at matching resolution and sequencing depth is shown as reference (see Methods for details). (B) Chromosome 19 Hi-C interaction frequencies (distance-normalized, upper triangle) versus the model-inferred interaction probabilities (distance-normalized, lower triangle) and their Spearman correlation formula image. (C) Chromosome 19 Hi-C correlation matrix (distance-normalized, upper triangle) versus the model-inferred genomic compartments component (distance-normalized, lower triangle) and their Spearman correlation formula image. (D) Closeup of a chromosome 19 3.5Mb region, showing Hi-C interaction frequencies (distance-normalized, upper triangle) versus the model-inferred interaction probabilities (distance-normalized, lower triangle). Genomic compartments appear in both the data and model, but TADs are apparent only in the data.
Figure 3.
Figure 3.
Model robustness at low sequencing depth. (A) Comparison of chromosome 19 interaction frequencies and model inferred interaction probabilities at 20kb resolution when using 100%, 10% and 0.5% of the data. Matrices were distance-normalized. (B) Model performance and stability when fitted to down-sampled chromosome 19 data at various resolutions. Left: distance-normalized Spearman correlation between the sampled Hi-C interaction frequencies and the model-inferred interaction probabilities. Right: mean absolute difference in state probabilities between a model fitted on the entire data and a model fitted on down-sampled data. (C) Model performance and stability at single-cell Hi-C sequencing depths. We inferred interaction probabilities on chromosome 19 at 20 kb resolution, treating these as ‘true’ interaction probabilities, and sampled interactions from these probabilities. Left: distance-normalized Spearman correlation between the ‘true’ interaction probabilities and the interaction probabilities inferred from the sampled interactions. Right: mean absolute difference between the ‘true’ state probabilities and the state probabilities inferred from the sampled interactions.
Figure 4.
Figure 4.
States interact differently within and between chromosomes. (A) State-state affinity matrix from whole genome fit of two-state model at 100 kb resolution. (B) Joint saddle plot of chromosomes 1 and 2. The rows and columns of the distance-normalized Hi-C interaction frequency matrix were sorted by formula image, the probability to be in state 1. cis and trans data were normalized to have the same mean interaction frequency. (C) Separate saddle plots of chromosomes 1 and 2 in cis and trans. cis and trans data were normalized to have the same mean interaction frequency. (D) State-state cis and trans affinity matrices from whole genome fit of two-state model with separate affinity matrices. (E) Separate saddle plots of chromosomes 1 and 2 in cis and trans using the inferred interaction probabilities rather than the Hi-C interaction frequencies.
Figure 5.
Figure 5.
Extending the model beyond two states. (A) Modelling a complex region in chromosome 22 (50 kb resolution). Top: Hi-C interaction frequencies. Middle: interaction probabilities inferred by a two-state model, accompanied by locus state probabilities formula image (blue state 1, orange state 2). Bottom: middle: interaction probabilities inferred by a two-state model, accompanied by locus state probabilities formula image (blue state 1, orange state 2, green state 3, red state 4). (B) Standard deviation of the mean row reconstruction error as a function of the number of states in the model. Arrows indicate knee points at four and seven states. (C) Distance-normalized Spearman correlation between the Hi-C interaction frequency matrix and the model's inferred interaction probability matrix, for whole-genome two-state and four-state models. The optimal possible correlation for each model at matching resolution and sequencing depth is shown as reference (see Methods for details). (D) GM12878 chromosome-level interaction frequency matrix. (E) chromosome-level interaction probability matrix inferred by whole-genome four-state model.
Figure 6.
Figure 6.
Analysis of four-state model parameters. All results shown were taken from fitting the whole-genome four-state model at 50Kb resolution. (A) State affinity cis matrix. (B) State affinity trans matrix. affinities matrices for the 4-state fit at 50 kb resolution showing different cis and trans affinities for all states. (C) Heatmaps representing the distribution of histone modification frequency for 10 different ENCODE (78) ChIP-Seq histone modification tracks, separated by state. (D) Histograms of locus state probabilities genome-wide. (E) Pearson correlation matrix of locus state probabilities. (F) Confusion matrix depicting locus state prediction by an elastic net multinomial logistic regression classifier from locus histone modifications.
Figure 7.
Figure 7.
Analysis of state mixing. (A) Evidence of state mixing in chromosome 19. The interaction pattern marked in the green-blue rectangle appears to be a mix of the interaction pattern marked in the blue rectangle and in the green rectangle. Locus state probabilities formula image are shown for states 3 and 7 taken from the seven-state model. (B) Averaged interaction profiles for the green-blue, blue, and green regions. (C) Schematic of simulated single-cell Hi-C profiles generated by the two simulated mixing scenarios. Top: In cell-level mixing, sparse single-cell interaction profiles are sampled from the previously shown green-blue interaction profile. Bottom: In population-level mixing, sparse single-cell interaction profiles are sampled 50% from the previously shown green interaction profile and 50% from the blue interaction profile. (D) Violin plots of the distributions of the mixing log ratio for cell-level mixing simulation, population-level mixing simulation, and real single-cell Hi-C data from Kim et al. (77). Mixing Log Ratio represents the logarithm of the ratio between the likelihood of a single-cell profile given cell-level mixing and the likelihood of the single-cell profile given population-level mixing, after accounting for expected distance-dependent differences (see Methods). Kolmogorov-Smirnov P-values are shown.

References

    1. Lieberman-Aiden E., van Berkum N.L., Williams L., Imakaev M., Ragoczy T., Telling A., Amit I., Lajoie B.R., Sabo P.J., Dorschner M.O.et al. .. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009; 326:289–293. - PMC - PubMed
    1. Quinodoz S.A., Ollikainen N., Tabak B., Palla A., Schmidt J.M., Detmar E., Lai M.M., Shishkin A.A., Bhat P., Takei Y.et al. .. Higher-order inter-chromosomal hubs shape 3D genome organization in the nucleus. Cell. 2018; 174:744–757. - PMC - PubMed
    1. Mccord R.P., Kaplan N., Giorgetti L.. 3C and beyond: towards an integrative view of chromosome structure and function. Mol. Cell. 2020; 77:688–708. - PMC - PubMed
    1. Zhao Z., Tavoosidana G., Sjölinder M., Göndör A., Mariano P., Wang S., Kanduri C., Lezcano M., Sandhu K.S., Singh U.et al. .. Circular chromosome conformation capture (4C) uncovers extensive networks of epigenetically regulated intra- and interchromosomal interactions. Nat. Genet. 2006; 38:1341–1347. - PubMed
    1. Denker A., De Laat W.. The second decade of 3C technologies: detailed insights into nuclear organization. Genes Dev. 2016; 30:1357–1382. - PMC - PubMed

Publication types

LinkOut - more resources