Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2019 Jul 18;10(7):548.
doi: 10.3390/genes10070548.

Computational Processing and Quality Control of Hi-C, Capture Hi-C and Capture-C Data

Affiliations
Review

Computational Processing and Quality Control of Hi-C, Capture Hi-C and Capture-C Data

Peter Hansen et al. Genes (Basel). .

Abstract

Hi-C, capture Hi-C (CHC) and Capture-C have contributed greatly to our present understanding of the three-dimensional organization of genomes in the context of transcriptional regulation by characterizing the roles of topological associated domains, enhancer promoter loops and other three-dimensional genomic interactions. The analysis is based on counts of chimeric read pairs that map to interacting regions of the genome. However, the processing and quality control presents a number of unique challenges. We review here the experimental and computational foundations and explain how the characteristics of restriction digests, sonication fragments and read pairs can be exploited to distinguish technical artefacts from valid read pairs originating from true chromatin interactions.

Keywords: Hi-C; capture Hi-C; processing; quality control.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
The restriction digestion of cross-linked chromatin results in fragments, also referred to as digests, whose ends correspond to restriction cutting sites of the chosen enzyme (step-like symbols). At this stage, the sample consists of a mixture of cross-linked protein-DNA complexes (A) and non cross-linked DNA (B). The digestion cannot be assumed to be complete, for instance, due to inaccessibility of DNA. Therefore, uncut restriction sites may also occur within digests.
Figure 2
Figure 2
Ligation between digests within the same cross-linked protein-DNA complex results in intended chimeric Hi-C products that consist of digest pairs linked by ligation junctions. Given pairs may form circular or linear molecules (A). Beyond that, the ends of digests from different protein-DNA complexes may also ligate, which is referred to as random cross-ligation. Those unintentional ligations lead to false positive predicted interactions (B). Furthermore, the ends of individual digests may ligate, which results in circular molecules and is referred to as self-ligation (C). Finally, the ends of given digests may remain un-ligated (D).
Figure 3
Figure 3
Shearing re-linearizes ring-shaped re-ligation products and introduces a new type of fragment end (denoted by flash-like symbols). At this stage, three different categories of fragments can be distinguished: chimeric fragments arising from interactions or cross-ligation (A) as well as fragments arising from un-ligated (B) and self-ligated digests (C). The size distribution of fragments results from digestion and shearing and can be assumed to be the same for all three categories. For chimeric fragments that contain multiple restriction sites, the size cannot be unambiguously determined (marked with an asterisk, see text below).
Figure 4
Figure 4
Only the two outermost ends of fragments are subjected to paired-end sequencing and mapped to the forward (red) and reverse strand (blue) of the corresponding reference genome. Read pairs arising from chimeric fragments may have all possible relative orientations (A). Read pairs arising from un-ligated fragments can only point inwards (B). Read pairs arising from self-ligation must point outwards (C).
Figure 5
Figure 5
Truncation of reads and calculation of fragment and digest sizes. Ligation junctions are sought in 5’-3’ direction; reads are 3’-truncated after any idenfitifed ligation junction. (A). Read pairs correspond to the outermost ends of fragments. The size of ligation fragments (lr) is calculated by summing up the sizes of the two segments that form the fragments (B). The size of self-ligating digests is calculated by adding the size of the un-ligated part (lu) of the digest to the calculated fragment size (C).
Figure 6
Figure 6
Processing logic for read pair filtering. Trans reads by definition are chimeric fragments and may represent valid biological interactions or random cross-ligation events (A). Pairs mapping to different strands of the same chromosome may originate from un-ligated or self-ligated digests (B). Inward pointing pairs that map to the same digest must have originated from un-ligated fragments. A size threshold is applied to the remaining fragments to categorize them as valid or artefactual (C). Outward pointing read pairs that map the same digest must have originated from self-ligated digests. A second size threshold is applied to the remaining fragments to categorize them as valid or artefactual (D). Read pairs mapping to the same strand can only be chimeric. However, we observe very small proportions of read pairs that are mapped to the same strand and digest. Such read pairs are classified as strange internal (E).
Figure 7
Figure 7
Proportion of trans read pairs per chromosome vs. total number of restriction digests per chromosome for a Capture Hi-C (CHC) experiment in human GM12878 cells (ERR436026).

Similar articles

Cited by

References

    1. Denker A., De Laat W. The second decade of 3C technologies: Detailed insights into nuclear organization. arXiv. 2016 doi: 10.1101/gad.281964.116.1011.1669v3 - DOI - PMC - PubMed
    1. Lupiáñez D.G., Spielmann M., Mundlos S. Breaking TADs: How alterations of chromatin domains result in disease. Trends Genet. 2016;32:225–237. doi: 10.1016/j.tig.2016.01.003. - DOI - PubMed
    1. Davies J.O., Telenius J.M., McGowan S.J., Roberts N.A., Taylor S., Higgs D.R., Hughes J.R. Multiplexed analysis of chromosome conformation at vastly improved sensitivity. Nat. Methods. 2015;13:74–80. doi: 10.1038/nmeth.3664. - DOI - PMC - PubMed
    1. Lieberman-Aiden E., Van Berkum N.L., Williams L., Imakaev M., Ragoczy T., Telling A., Amit I., Lajoie B.R., Sabo P.J., Dorschner M.O., et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–293. doi: 10.1126/science.1181369. - DOI - PMC - PubMed
    1. de Wit E., de Laat W. A decade of 3C technologies: Insights into nuclear organization. Genes Dev. 2012;26:11–24. doi: 10.1101/gad.179804.111. - DOI - PMC - PubMed

Publication types

LinkOut - more resources