Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep;10(27):e2301058.
doi: 10.1002/advs.202301058. Epub 2023 Jul 28.

DeCOOC Deconvoluted Hi-C Map Characterizes the Chromatin Architecture of Cells in Physiologically Distinctive Tissues

Affiliations

DeCOOC Deconvoluted Hi-C Map Characterizes the Chromatin Architecture of Cells in Physiologically Distinctive Tissues

Junmei Wang et al. Adv Sci (Weinh). 2023 Sep.

Abstract

Deciphering variations in chromosome conformations based on bulk three-dimensional (3D) genomic data from heterogenous tissues is a key to understanding cell-type specific genome architecture and dynamics. Surprisingly, computational deconvolution methods for high-throughput chromosome conformation capture (Hi-C) data remain very rare in the literature. Here, a deep convolutional neural network (CNN), deconvolve bulk Hi-C data (deCOOC) that remarkably outperformed all the state-of-the-art tools in the deconvolution task is developed. Interestingly, it is noticed that the chromatin accessibility or the Hi-C contact frequency alone is insufficient to explain the power of deCOOC, suggesting the existence of a latent embedded layer of information pertaining to the cell type specific 3D genome architecture. By applying deCOOC to in-house-generated bulk Hi-C data from visceral and subcutaneous adipose tissues, it is found that the characteristic chromatin features of M2 cells in the two anatomical loci are distinctively bound to different physiological functionalities. Taken together, deCOOC is both a reliable Hi-C data deconvolution method and a powerful tool for functional extraction of 3D genome architecture.

Keywords: bulk Hi-C; cell type compositions; computational deconvolution; deep learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Overview of the deCOOC model. A) Model architecture. The model consists of four convolution layers (each layer of the first three was coupled with one maxpooling layer) and two fully connected neural network layers and the last outputs layer. The input is two square‐like interaction matrices derived from the Hi‐C matrices of two different chromosomes. The last fully connected layer outputs the predicted cell type proportions. Model training and parameter optimization based on Hi‐C data were carried out by minimizing the sum of squares of residues between predicted cell fractions and ground‐truth cell fractions. B) Input design for the model. From the complete Hi‐C matrix (e.g., with a resolution of 500 kb) of one chromosome, multiple square‐like sub‐interaction matrices with fixed sizes (e.g., 30 bins) and steps (e.g., 20 bins) are derived diagonally. Two diagonal sub‐interaction matrices from two chromosomes are stitched together along the row axis.
Figure 2
Figure 2
deCOOC performs better (lower RMSE and higher CCC[ 27 ]) on simulated mouse data than other methods. A) Boxplots of RMSE and CCC values over all test bulk samples from deCOOC and other deconvolution algorithms for the simulated mCC test dataset. B) Lineplots of RMSE and CCC values for each cell type. Each symbol represents the RMSE or CCC value between ground‐truth and predicted cell fractions for one cell type. C) Scatterplots of RMSE (CCC in bottom row) values and the number of Hi‐C contacts for simulated mouse data with deCOOC, ssKL, CDSeq, and CS. Pearson correlation coefficients and p values are given above the plots. Low RMSE and high CCC values represent good prediction performance of the method. For all algorithms, the number of test samples n = 363.
Figure 3
Figure 3
deCOOC behaves more robustly on simulated HFC data than the other methods. A) Boxplots of RMSE and CCC values over all test bulk samples from deCOOC and other deconvolution algorithms for the simulated HFC test dataset. B) Lineplots of RMSE and CCC values for each cell type. Each symbol represents the RMSE or CCC value between ground‐truth and predicted cell fractions for one cell type. C) Scatterplots of RMSE (CCC in bottom row) values and the number of Hi‐C interaction contacts of simulated HFC bulk data with deCOOC, CS, and DeconRNASeq. Pearson correlation coefficients and p values are given above the plots. For HFC data, the number of test samples n = 486.
Figure 4
Figure 4
SHAP analysis for model interpretation. A) Scatterplots show weak correlation between SHAP values and Hi‐C (observed/expected) for mCC and HFC examples (e.g., examples are the same as those shown in Figure 4C). Pearson correlation coefficients and p values are given above the plots. B) Correlation analysis between chromatin accessibility and SHAP values based on the HFC dataset. Chromatin accessibility was significantly higher in the group with higher SHAP values (i.e., (0.01,  0.1)) than in the other two groups for Astro and ODC cell types. The L23 cell type showed that chromatin openness only in the median SHAP values group (0.001, 0.01) was dramatically greater than that in the lower SHAP values group. The correlation between SHAP values and chromatin accessibility is dependent on different cell types. P values were calculated using a one‐sided Wilcoxon signed‐rank test. C) Examples of paired Hi‐C matrix (lower left) and SHAP value maps (upper right) for each cell type of mCC and HFC. The regions of 13–28 mb and 10–25 mb of the two chromosomes for the mCC and HFC bulk examples are shown. Cell types and chromosome numbers are labeled at the top and left of the plots, respectively, while the fraction of each cell type is presented in parentheses. For the HFC example, the fourteen cell types were sorted into four categories (labeled by four rectangles of different colors) according to the clustering of cell‐type specific chromatin interactions.[ 6 ] D) Heatmap of SHAP values (example shown in C) for each cell type prediction (left for mCC example, right for HFC example). Each row in the heatmap indicates the SHAP values for an interaction site. (Significant differences: *P < 0.05, **P < 0.01, ***P < 0.001).
Figure 5
Figure 5
deCOOC performs better than CS and CDseq (lower RMSE, but higher CCC) on pig tissue Hi‐C data. A) Boxplots of RMSE and CCC values from deCOOC, CS, and CDSeq for simulated pig bulk Hi‐C data (randomly sampled experimental Hi‐C contacts of four pig cell lines with artificially produced cell fractions). B) Boxplots of RMSE and CCC for assessing deconvolution performance of the three algorithms on real pig tissues. The deconvolution of CS and CDSeq was conducted on all 22 real adipose samples, and the deconvolution of deCOOC was performed five times on five samples randomly selected from 22 adipose samples (the remaining 17 samples were used to fine‐tune the model), which was performed five times. C) Scatterplots of CIBERSORTx‐predicted cell fractions (x‐axis) and deconvoluted cell fractions (y‐axis) from fine‐tuned deCOOC, CS, and CDSeq on real samples. The corresponding CCC values for the three methods are presented above the plots. D) RMSE and CCC values of deconvolution on five real test tissues (x‐axis) from the three deconvolution methods. deCOOC was fine‐tuned using different numbers of real tissues. E) Differential expression on a log 2 scale of five genes for subcutaneous adipose tissue (SAT) and visceral adipose tissue (VAT). (Significant differences: ****P < 0.0001).

References

    1. Gates M., Agency U. S., Miiro G., Serwanga J., Pozniak A., Mcphee D., Jaoko W., Dehovitz J., Bekker L. G., Pitisuttithum P., J. Virol. 2009, 83, 7337.
    1. Goel V. Y., Hansen A. S., Wiley Interdiscip. Rev.: Dev. Biol. 2021, 10, e395. - PMC - PubMed
    1. Ron G., Globerson Y., Moran D., Kaplan T., Nat. Commun. 2017, 8, 2237. - PMC - PubMed
    1. Dekker J., Mirny L. A., Cell 2016, 164, 1110. - PMC - PubMed
    1. Zheng H., Biol. Mood Anxiety Disord. 2019, 20, 535.

Publication types

LinkOut - more resources