Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul;14(7):679-685.
doi: 10.1038/nmeth.4325. Epub 2017 Jun 12.

Comparison of computational methods for Hi-C data analysis

Affiliations

Comparison of computational methods for Hi-C data analysis

Mattia Forcato et al. Nat Methods. 2017 Jul.

Abstract

Hi-C is a genome-wide sequencing technique used to investigate 3D chromatin conformation inside the nucleus. Computational methods are required to analyze Hi-C data and identify chromatin interactions and topologically associating domains (TADs) from genome-wide contact probability maps. We quantitatively compared the performance of 13 algorithms in their analyses of Hi-C data from six landmark studies and simulations. This comparison revealed differences in the performance of methods for chromatin interaction identification, but more comparable results for TAD detection between algorithms.

PubMed Disclaimer

Conflict of interest statement

Competing Financial Interests

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1. Tools for Hi-C data analysis used in the comparison and performances in data preprocessing.
a) Tools for the identification of chromatin interactions and TADs from Hi-C data and key analysis steps (orange arrows). Blue boxes detail the strategy used in each analysis step by each tool. A grey box is used when an external tool is required for a preprocessing step. Since most tools perform filtering and binning together, a blue or grey box spanning both steps is used in the schematic workflow. For filtering the following abbreviations are used: read level filtering (R); read-pair level filtering (R-pair); fragment level filtering (Fr.). b) Percentage of aligned read pairs (alignment rate) for all datasets ordered by read length (grey arrows at the bottom). Data are shown as mean±standard error of the mean. Samples with different or mixed read length were not used when calculating the alignment rate. c) Percentage of mapped reads retained after filtering (fraction of usable reads) in each dataset, ordered by experimental protocol (grey arrows at the bottom). Data are shown as mean±standard error of the mean. GOTHiC could not be applied to Dixon 2015 since the read-pairing step required an amount of memory larger than 1 TB of RAM.
Figure 2
Figure 2. Comparative results of methods for the identification of chromatin interactions.
a) Scatter plot of total number of cis interactions called by each method as a function of the number of reads retained by the filtering step in all datasets at 5kb resolution (i.e., Jin H1-hESC, Jin IMR90, Rao GM12878, Rao IMR90, and Dixon 2015 H1-hESC; n= 32). Different points represent sample replicates. Linear interpolation for each method is shown as a solid line. b) Boxplot of average distances between anchoring points in cis interactions (log scale) in sample replicates considering all datasets analyzed at 5kb resolution (n= 32). c) Heatmap of the contact matrix of Rao GM12878 replicate H (chr21:35,000,000-36,000,000) at 5kb resolution. Identified peaks are marked in different colors for the various methods. d) Box plots of the Jaccard Index for concordance of cis (upper) and trans (lower) interaction calls between sample replicates (intra-dataset concordance) for all datasets with at least 2 replicates (n=39; Supplementary Table 1). For Fit-Hi-C and HiCCUPS, the Jaccard Index was calculated only for cis interactions since these tools do not return trans interactions. e) Proportion of cis interactions classified on the base of the chromatin states at their anchoring points (promoter-enhancer, upper; heterochromatin/quiescent to heterochromatin/quiescent, middle; less expected, lower) in all datasets at 5kb. With the exception of Jin H1-hESC (that contains a single replicate), only cis interactions conserved in at least 2 replicates within each dataset were classified using the chromatin states (Supplementary Table 4). f) Performances in the identification of true positive validated evidences of cis interactions. Each row represents the comparison between a list of true positives and the interactions called by each method in each dataset. The dot size is proportional to the percentage of recalled true positives and the dot color accounts for the number of total called interactions. The validation technique and the name of true positive lists are displayed on the left side. The dataset used to call interactions are on the right and shaded in grey if at 40 kb resolution. True-positive interactions were searched among cis interactions conserved in at least 2 replicates within each dataset, with the exception of Jin H1-hESC and Sexton (both containing a single replicate). GOTHiC was not applied to Dixon 2015 (see legend of Fig. 1c).
Figure 3
Figure 3. Comparative results of methods for the identification of TADs.
a) Scatter plot of total number of TADs called by each method as a function of the number of reads retained by the filtering step in all datasets except Lieberman-Aiden and Jin H1-hESC (n=36; Supplementary Table 1). Different points represent sample replicates. Loess interpolation for each method is shown as solid line. b) Boxplot of median TAD size in all replicates of all datasets (analyzed at 40kb) except Lieberman-Aiden and Jin H1-hESC (n=36). c) Heatmap of the contact matrix of Rao GM12878 replicate H (chr1:153,000,000-155,500,000) at 40kb resolution. Identified TADs are framed in different colors for the various methods. d) Box plots of the Jaccard Index for concordance of TAD boundaries between sample replicates of all datasets with at least 2 replicates (n=39).

References

    1. Dekker J, Rippe K, Dekker M, Kleckner N. Capturing chromosome conformation. Science. 2002;295:1306–1311. - PubMed
    1. Lieberman-Aiden E, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–293. - PMC - PubMed
    1. Pombo A, Dillon N. Three-dimensional genome architecture: players and mechanisms. Nat Rev Mol Cell Biol. 2015;16:245–257. - PubMed
    1. Cavalli G, Misteli T. Functional implications of genome topology. Nat Struct Mol Biol. 2013;20:290–9. - PMC - PubMed
    1. Dixon JR, et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485:376–380. - PMC - PubMed

LinkOut - more resources