. 2015 Jan 15:72:65-75.

doi: 10.1016/j.ymeth.2014.10.031. Epub 2014 Nov 6.

The Hitchhiker's guide to Hi-C analysis: practical guidelines

Bryan R Lajoie¹, Job Dekker², Noam Kaplan³

Affiliations

¹ Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, 368 Plantation Street, Worcester, MA 01605-0103, USA. Electronic address: Bryan.lajoie@umassmed.edu.
² Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, 368 Plantation Street, Worcester, MA 01605-0103, USA. Electronic address: Job.dekker@umassmed.edu.
³ Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, 368 Plantation Street, Worcester, MA 01605-0103, USA. Electronic address: noam.kaplan2@gmail.com.

PMID: 25448293
PMCID: PMC4347522
DOI: 10.1016/j.ymeth.2014.10.031

The Hitchhiker's guide to Hi-C analysis: practical guidelines

Bryan R Lajoie et al. Methods. 2015.

. 2015 Jan 15:72:65-75.

doi: 10.1016/j.ymeth.2014.10.031. Epub 2014 Nov 6.

Authors

Bryan R Lajoie¹, Job Dekker², Noam Kaplan³

Affiliations

¹ Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, 368 Plantation Street, Worcester, MA 01605-0103, USA. Electronic address: Bryan.lajoie@umassmed.edu.
² Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, 368 Plantation Street, Worcester, MA 01605-0103, USA. Electronic address: Job.dekker@umassmed.edu.
³ Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, 368 Plantation Street, Worcester, MA 01605-0103, USA. Electronic address: noam.kaplan2@gmail.com.

PMID: 25448293
PMCID: PMC4347522
DOI: 10.1016/j.ymeth.2014.10.031

Abstract

Over the last decade, development and application of a set of molecular genomic approaches based on the chromosome conformation capture method (3C), combined with increasingly powerful imaging approaches, have enabled high resolution and genome-wide analysis of the spatial organization of chromosomes. The aim of this paper is to provide guidelines for analyzing and interpreting data obtained with genome-wide 3C methods such as Hi-C and 3C-seq that rely on deep sequencing to detect and quantify pairwise chromatin interactions.

Keywords: Bioinformatics; Chromatin structure; Chromosome conformation capture; Deep sequencing; Hi-C.

PubMed Disclaimer

Figures

**Figure1**
Flow chart for processing Hi-C Data. Reads are first mapped using the iterative mapping approach for paired end reads. Only paired end reads where both ends map uniquely are kept, all others are discarded. Mapped reads are then assigned to a restriction fragment, and fragment-fragment interactions are assembled. Fragment level filtering is applied. Un-ligated fragments and self-ligated fragments are removed. Optional strand-specific filters are applied. PCR duplicates are removed. Data is then binned. Bin-level filtering is then applied. Outlier bin-bin point interactions (2D) are removed. Outlier bins (1D row/cols) are removed.

**Figure2**
Mapping and filtering.(a) Following the Hi-C method, fragments are ligated. Hi-C junctions are then sheared and sequenced. Hi-C junctions can be sequenced by using either paired-end sequencing or single-end sequencing. * - Here a Hi-C junction is incapable of being sequenced by a 100bp single end run, as the read does not extend past the junction into the second fragment. Should the read length increase, then the sequenced read would cross the junction. ** - Here we highlight the fact that same stranded paired reads could be the result of undigested chromatin, and thus would not represent an actual Hi-C interaction. (b) Iterative mapping approach for aligningpaired-end Hi-C reads. In gray, from top to bottom above/below each read, the mapping iterations are shown as the read is extended and re-mapped. Iterative mapping concludes when either the read is uniquely aligned, or the maximal read length is reached. The number of iterations is a factor of mappability and the location of the junction. (c) After mapping, the paired reads can either map to a single fragment, or to different fragments. Reads mapping to a single fragment are considered uninformative. Self-ligations and un-ligated fragments are classified by the read strand. Inward pointing reads are considered un-ligated fragments (“dangling ends”). Outward pointing reads are classified as self-ligated fragments (“self-circles”) as they form circular products. Same-strand reads are classified as “error pairs” as these products are a result of either a mis-mapping, random break, or an incorrect genome assembly. Reads mapping to different fragments are used to assemble the Hi-C dataset. All strand combinations are possible and are expected to be observed in equal proportions (25% per combination). However, inward and outward pairs could be the result of un-digested restriction sites, and then processed as either self-ligated or un-ligated products. Imbalance in the relative proportions of the strand combinations, could suggest the need for additional filtering.

**Figure 3**
Hi-C interaction matrix for 3 chromosomes. On the left, raw Hi-C data. On the right, filtered and balanced Hi-C data. The arrows below the heatmaps mark bins (rows/cols) that are filtered. Following the balancing procedure, the sum of each row/col is equal. This results in an overall smootherheatmap.

**Figure 4**
Averaging effects in Hi-C data. In this toy example, a square interaction pattern is apparent in the top interaction matrices representing subpopulations, yet its location varies. The final Hi-C interaction matrix, which consists of the average of all subpopulations, does not show the square interaction pattern, and shows a pattern that is not present in individual subpopulations.

**Figure 5**
Ergodicity in Hi-C. This toy example follows, over time, the interaction of two loci in a population of 4 cells. Each row represents a time point and each column represents a cell. In the non-ergodic population (left), the interaction is maintained in the same cell over all time points. In the ergodic population (right), the interaction appears in different cells, such that its frequency in time is equal to its frequency in the population (both are 0.25). In Hi-C, which measures a single time point (i.e. a row) in a population of cells, the ergodic and non-ergodic cases are indistinguishable.

**Figure 6**
Cis/trans ratio. A Hi-C interaction matrix (shown on 3 chromosomes for simplicity). Sample cis and trans regions are highlighted.

**Figure 7**
Distance-dependent interaction frequency.Shown are distance-dependent interaction frequency curves for metaphase and unsynchronized HeLa Hi-C from (Naumova et al. 2013). Note the slope change in the metaphase data which occurs at 10Mb (indicated by the black arrow). Thus, loci separated by fewer than 10 Mb interact frequently, whereas loci separated by more than 10 Mb rarely interact. This information has been incorporated into polymer models of mitotic chromosomes.

**Figure 8**
Genomic compartments. Top: Hi-C interaction matrix (shown on 3 chromosomes for simplicity) along with the calculated compartment value (first principal component). Below: outer product of the first principal component with itself yields a rank-1 reconstruction of the interaction matrix.

**Figure 9**
Topologically associating domains (TADs). A 45-degree rotated interaction matrix shows TAD patterns. Below, the directionality index and insulation score are shown together with the called non-overlapping set of TADs. Data was taken from Dixon et al. (Dixon et al. 2012).

See this image and copyright information in PMC

References

1. Ay F, Bailey TL, Noble WS. Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts. Genome Res. 2014a:1–23. - PMC - PubMed
1. Ay F, Bunnik EM, Varoquaux N, Bol SM, Prudhomme J, Vert J-P, Noble WS, Le Roch KG. Three-dimensional modeling of the P. falciparum genome during the erythrocytic cycle reveals a strong connection between genome architecture and gene expression. Genome Res. 2014b;24:974–88. - PMC - PubMed
1. Baù D, Sanyal A, Lajoie BR, Capriotti E, Byron M, Lawrence JB, Dekker J, Marti-Renom M a. The three-dimensional folding of the α-globin gene domain reveals formation of chromatin globules. Nat Struct Mol Biol. 2011;18:107–14. - PMC - PubMed
1. Beitel CW, Froenicke L, Lang JM, Korf IF, Michelmore RW, Eisen JA, Darling AE. Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products. PeerJ. 2014;2:e415. - PMC - PubMed
1. Belton J-M, McCord RP, Gibcus JH, Naumova N, Zhan Y, Dekker J. Hi-C: A comprehensive technique to capture the conformation of genomes. Methods San Diego Calif. 2012:1–9. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The Hitchhiker's guide to Hi-C analysis: practical guidelines

Affiliations

The Hitchhiker's guide to Hi-C analysis: practical guidelines

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources