Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Apr 8:19:2070-2083.
doi: 10.1016/j.csbj.2021.04.016. eCollection 2021.

Application of Hi-C and other omics data analysis in human cancer and cell differentiation research

Affiliations
Review

Application of Hi-C and other omics data analysis in human cancer and cell differentiation research

Haiyan Gong et al. Comput Struct Biotechnol J. .

Abstract

With the development of 3C (chromosome conformation capture) and its derivative technology Hi-C (High-throughput chromosome conformation capture) research, the study of the spatial structure of the genomic sequence in the nucleus helps researchers understand the functions of biological processes such as gene transcription, replication, repair, and regulation. In this paper, we first introduce the research background and purpose of Hi-C data visualization analysis. After that, we discuss the Hi-C data analysis methods from genome 3D structure, A/B compartment, TADs (topologically associated domain), and loop detection. We also discuss how to apply genome visualization technologies to the identification of chromosome feature structures. We continue with a review of correlation analysis differences among multi-omics data, and how to apply Hi-C and other omics data analysis into cancer and cell differentiation research. Finally, we summarize the various problems in joint analyses based on Hi-C and other multi-omics data. We believe this review can help researchers better understand the progress and applications of 3D genome technology.

Keywords: A/B compartment; Chromosome structure; Hi-C; Loop; Omics data; TADs; Visualization.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
A/B compartment visualization with annotation. From top to bottom, the following visualization is shown: RNA-seq, Dnasel, CTCF (Broad), H3K27ac annotation, eigenvector, subcompartments and Hi-C contact heatmap. For Hi-C contact heatmap, we choose to show observed/expected balanced Hi-C data to visualize A/B compartment with 500 kb resolution, blue means A compartment, red means B compartment, combined with RNA-seq, Dnasel and H3K27ac, we can see, the blue region had higher gene expression and higher signals of h3k27ac, dnasel and CTCF. (Note: this Figure is drawn using the juicer box tool , Data Source: Rao and Huntley et al. Cell GM12878 Hi-C in situ chr1: 0 MB-120 MB). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 2
Fig. 2
TADs and loops visualization with annotation. From top to bottom, the following visualization is shown: RNA-seq, Dnasel, CTCF (Broad), H3K27ac annotation and Hi-C contact heatmap. For the Hi-C contact heatmap, Squares of contact frequency along with the diagonal (yellow squares) indicate the TADs, peaks (black points) in the contact heatmap indicate the chromatin loops. (Note: this Figure is drawn using the juicer box tool , Data Source: Rao and Huntley et al. Cell GM12878 Hi-C in situ chr1: 63,820,000–72,460,000). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 3
Fig. 3
Process of simulating chromosomes’ 3D structure. First, data is preprocessed to get distance matrix or contact data, which is used as the second step’s input. Second, we could choose or not choose some prior conditions (such as biological characteristics, physical forces, and FISH data) to help us build chromosome reconstruction models. In the past ten years, most researchers chose to reconstruct chromosomes by probability-based inferential, distance-based or contact-based methods. At the end of the analysis, we will have a 3D coordinate set for visualization.
Fig. 4
Fig. 4
Analysis of multi-omics data. The superscripts 0, 1, and 2 represent the structure, epigenetic regulation, and expression related sequencing technology and corresponding analysis methods. From the loop level, we can do KEGG pathway and CTCF analysis; from the TAD level, we can use ChIP-seq, ATAC-seq, and WGBS to detect transcription factor sites or do variant calling and histone modification analysis; from the A/B compartment level, we can use ChIP-seq, ATAC-seq, WGBS, and RNA-seq to detect promoter, enhancer, transcription factor sites or conduct gene expression analysis.
Fig. 5
Fig. 5
Differential analysis of data from the GM12878 and K562 cell lines. A. Differential heatmap of all chromosomes between the GM12878 lymphocyte line and the K562 cell line. The red color in the figure indicates sites with stronger interactions in the GM12878 cell line than the K562 cell line, and the blued ones indicate weaker interactions in the GM12878 cell line than in the K562 cell line. B. Differential heatmap of chromosome 1 between the GM12878 lymphocyte line and the K562 cell line. The points on the diagonal lines are the identified loops. The dark purple points represent the loops of GM12878, and the dark blue points represent the loops of K562. C-D. The loops and differential loops’ location of the GM12878 lymphocyte line and the K562 cell line chromosome 1 from 0 to 20,000,000 bp. Each arc indicates chromatin interaction from the start site to the end site. E. ChIP-Seq data is used to visualize the H3K27me3 and H3K4me1 histone modification peaks in the GM12878 and K562 cell lines, chromosome 1 range of 0–20,000,000 bp. (note: GM12878 Hi-C in situ and K562 Hi-C in situ data is from reference , A-D are drawn with the juicer box tool , and the G is drawn with the IGV tool [204]). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Similar articles

Cited by

References

    1. Wang Z., Gerstein M., Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10(1):57–63. - PMC - PubMed
    1. Cui K, Zhao K. Genome-Wide Approaches to Determining Nucleosome Occupancy in Metazoans Using MNase-Seq. Methods in molecular biology (Clifton, N.J.), 2012, 833: 413-419. - PMC - PubMed
    1. Song L, Crawford G E. DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. Cold Spring Harbor Protocols, 2010, 2010(2): pdb. prot5384. - PMC - PubMed
    1. Buenrostro J.D., Wu B., Chang H.Y. ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr Protocols Mol Biol. 2015;109(1) - PMC - PubMed
    1. Giresi P.G., Kim J., Mcdaniell R.M. FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. Genome Res. 2007;17(6):877–885. - PMC - PubMed