Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Nov 7:23:4027-4035.
doi: 10.1016/j.csbj.2024.11.002. eCollection 2024 Dec.

A mini-review of single-cell Hi-C embedding methods

Affiliations
Review

A mini-review of single-cell Hi-C embedding methods

Rui Ma et al. Comput Struct Biotechnol J. .

Abstract

Single-cell Hi-C (scHi-C) techniques have significantly advanced our understanding of the 3D genome organization, providing crucial insights into the spatial genome architecture within individual nuclei. Numerous computational and statistical methods have been developed to analyze scHi-C data, with embedding methods playing a key role. Embedding reduces the dimensionality of complex scHi-C contact maps, making it easier to extract biologically meaningful patterns. These methods not only enhance cell clustering based on chromatin structures but also facilitate visualization and other downstream analyses. Most scHi-C embedding methods incorporate strategies such as normalization and imputation to address the inherent sparsity of scHi-C data, thereby further improving data quality and interpretability. In this review, we systematically examine the existing methods designed for scHi-C embedding, outlining their methodologies and discussing their capabilities in handling normalization and imputation. Additionally, we present a comprehensive benchmarking analysis to compare both embedding techniques and their clustering performances. This review serves as a practical guide for researchers seeking to select suitable scHi-C embedding tools, ultimately contributing to the understanding of the 3D organization of the genome.

Keywords: Dimensionality reduction; Embedding; Genome architecture; Single-cell Hi-C.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Fig. 1
Fig. 1
Single-cell Hi-C analysis workflow. (A) A simplified workflow of scHi-C data analysis; (B) Typical scHi-C embedding workflow: scHi-C contact maps serve as input and often undergo normalization and/or imputation prior to dimensionality reduction. This process extracts important features and outputs latent embeddings for further analysis, such as clustering.
Fig. 2
Fig. 2
Visualization and clustering of Nagano et al. dataset. This set of scatterplots provides 2D visualizations of the embeddings from the Nagano et al. dataset, obtained using UMAP with two components. Each dot represents an individual cell, with different colors indicating four cell-cycle stages.
Fig. 3
Fig. 3
Visualization and clustering of Tan et al. dataset. This set of scatterplots provides 2D visualizations of the embeddings from the Tan et al. dataset, obtained using UMAP with two components. Each dot represents an individual cell, with different colors indicating 13 cell subtypes.
Fig. 4
Fig. 4
Clustering performances of scHi-C embedding methods. The clustering scores were derived from the 2D UMAP embeddings of (A) the mouse cell-cycle dataset (Nagano et al.) and (B) the mouse developmental brain dataset (Tan et al.). The x-axis represents NMI scores and the y-axis represents ARI scores. Each point represents the results of a scHi-C embedding method, with different colors and labels indicating the specific method used.
Fig. 5
Fig. 5
Clustering performances with and without normalization/imputation. (A) Comparison of the clustering results with and without normalization. (B) Comparison of the clustering results with and without imputation. Each panel includes two barplots (left: ARI; right: NMI), displaying clustering scores based on 2D embeddings derived from the following methods: BandNorm, scHiCluster, and scHiCTools. Note that scHiCTools offers three normalization options: observed/expected (OE), Vanilla coverage (VC), and Knight-Ruiz (KR), as well as three imputation options: linear convolution (CN), random walk (RW), and network enhancing (NE). For each method, scores for non-normalized/imputed and normalized/imputed data are shown side-by-side. Rounded scores are annotated above each bar for clarity.

Similar articles

Cited by

References

    1. Cremer T., Cremer C. Chromosome territories, nuclear architecture and gene regulation in mammalian cells. Nat Rev Genet. 2001;2(4):292–301. - PubMed
    1. Kumaran R.I., Thakar R., Spector D.L. Chromatin dynamics and gene positioning. Cell. 2008;132(6):929–934. - PMC - PubMed
    1. Dekker J., Marti-Renom M.A., Mirny L.A. Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data. Nat Rev Genet. 2013;14(6):390–403. - PMC - PubMed
    1. Bonev B., Cavalli G. Organization and function of the 3D genome. Nat Rev Genet. 2016;17(11):661–678. - PubMed
    1. Misteli T. The self-organizing genome: principles of genome architecture and function. Cell. 2020;183(1):28–45. - PMC - PubMed