Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Nov 20:4:1310.
doi: 10.12688/f1000research.7334.1. eCollection 2015.

HiCUP: pipeline for mapping and processing Hi-C data

Affiliations

HiCUP: pipeline for mapping and processing Hi-C data

Steven Wingett et al. F1000Res. .

Abstract

HiCUP is a pipeline for processing sequence data generated by Hi-C and Capture Hi-C (CHi-C) experiments, which are techniques used to investigate three-dimensional genomic organisation. The pipeline maps data to a specified reference genome and removes artefacts that would otherwise hinder subsequent analysis. HiCUP also produces an easy-to-interpret yet detailed quality control (QC) report that assists in refining experimental protocols for future studies. The software is freely available and has already been used for processing Hi-C and CHi-C data in several recently published peer-reviewed studies.

Keywords: Bioinformatics; CHi-C; Chromatin; Epigenetics; Genomics; Hi-C; Pipeline; Structure.

PubMed Disclaimer

Conflict of interest statement

Competing interests: No competing interests were disclosed.

Figures

Figure 1.
Figure 1.
a) Diagram summarising the Hi-C experimental protocol. The red and blue rectangles represent cross-linked restriction fragments while the yellow marker shows the position of biotin incorporation. b) Generation of the Hi-C ligation junction sequence by successive digestion (with HindIII in this example), fill in and blunt-ended ligation steps. The modified restriction site sequence is not found in the original genomic sequence.
Figure 2.
Figure 2.. Overview of experimental artefacts generated by the Hi-C experimental protocol.
The schematic shows the genome digested into 5 restriction fragments. These fragments may subsequently ligate to each other, or fragments derived from another chromosome, forming valid cis or trans di-tags respectively ( a). In contrast, re-ligation or incomplete digestion leads to the generation of invalid contiguous sequences ( b). Another common artefact occurs when the sequenced read-pair maps to a single restriction fragment ( c), ( d) & ( e). Further, PCR may result in a fragment being copied multiple times ( f). Di-tags are also rejected when the mapped reads are positioned too far away from the putative restriction enzyme cut-site than allowed by the experimental size-selection step ( g).
Figure 3.
Figure 3.. Flow diagram summarising the HiCUP pipeline.
HiCUP takes FASTQ files generated by DNA sequencing and produces cleaned mapped data accompanied with QC reports. The bulk of the pipeline comprises 4 scripts: Truncater, Mapper, Filter and Deduplicator. These are executed in turn by the HiCUP master script which controls data flow through the pipeline. (The diagram uses rectangles with angled or rounded edges to represent data files or HiCUP Perl scripts respectively.)
Figure 4.
Figure 4.. Experiment to determine whether Hi-C duplicates represent genuine independent interaction events or are the product of PCR amplification.
The diagram shows di-tags (shaded rectangles) delimited by a pair of barcoded sequencing adapters.

References

    1. Lieberman-Aiden E, van Berkum NL, Williams L, et al. : Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326(5950):289–293. 10.1126/science.1181369 - DOI - PMC - PubMed
    1. Schoenfelder S, Furlan-Magaril M, Mifsud B, et al. : The pluripotent regulatory circuitry connecting promoters to their long-range interacting elements. Genome Res. 2015;25(4):582–597. 10.1101/gr.185272.114 - DOI - PMC - PubMed
    1. Mifsud B, Tavares-Cadete F, Young AN, et al. : Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat Genet. 2015;47(6):598–606. 10.1038/ng.3286 - DOI - PubMed
    1. Schoenfelder S, Sugar R, Dimond A, et al. : Polycomb repressive complex PRC1 spatially constrains the mouse embryonic stem cell genome. Nat Genet. 2015;47(10):1179–1186. 10.1038/ng.3393 - DOI - PMC - PubMed
    1. Yaffe E, Tanay A: Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat Genet. 2011;43(11):1059–1065. 10.1038/ng.947 - DOI - PubMed