Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 6;20(5):e1012067.
doi: 10.1371/journal.pcbi.1012067. eCollection 2024 May.

Cooltools: Enabling high-resolution Hi-C analysis in Python

Affiliations

Cooltools: Enabling high-resolution Hi-C analysis in Python

Open2C et al. PLoS Comput Biol. .

Abstract

Chromosome conformation capture (3C) technologies reveal the incredible complexity of genome organization. Maps of increasing size, depth, and resolution are now used to probe genome architecture across cell states, types, and organisms. Larger datasets add challenges at each step of computational analysis, from storage and memory constraints to researchers' time; however, analysis tools that meet these increased resource demands have not kept pace. Furthermore, existing tools offer limited support for customizing analysis for specific use cases or new biology. Here we introduce cooltools (https://github.com/open2c/cooltools), a suite of computational tools that enables flexible, scalable, and reproducible analysis of high-resolution contact frequency data. Cooltools leverages the widely-adopted cooler format which handles storage and access for high-resolution datasets. Cooltools provides a paired command line interface (CLI) and Python application programming interface (API), which respectively facilitate workflows on high-performance computing clusters and in interactive analysis environments. In short, cooltools enables the effective use of the latest and largest genome folding datasets.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Overview of cooltools functionality.
Open2C provides a modular ecosystem of software libraries for Hi-C analysis (highlighted with gray boxes). Pairtools [58] takes in paired-end sequence alignments and extracts contact pairs in the 4DN.pairs format. cooler [8] bins these contact pairs and stores the resulting sparse matrices in.cool and.mcool formats. The nextflow pipeline distiller [59] converts sequencing reads from FASTQ files directly into binned and normalized cooler files, integrating read alignment with pairtools and cooler. The library introduced in this paper, cooltools (in bold), provides methods to quantify and extract features from high-resolution contact maps stored by cooler.
Fig 2
Fig 2. Expected and contact frequency versus distance.
a. Observed contact map for HFF Micro-C for chr2 and chr17 at 1Mb. Chromosomal arms p and q are depicted as light and dark grey rectangles respectively. Note the wide unmappable centromeric regions (white rows and columns) between chromosomal arms. Accounting for these regions is a key aspect of calculating an expected map. b. Expected map for three classes of regions: intra-chromosomal intra-arm, intra-chromosomal inter-arm, and inter-chromosomal. Regions for expected are specified using genomic views, where individual regions are chromosomal arms. Note that intra-chromosomal expected has a strongly decreasing contact frequency with genomic distance, whereas inter-chromosomal expected appears flat. c. Average contact frequency versus genomic separation, or P(s), for intra-arm interactions (blue, orange) and for inter-arm interactions (green), calculated from contact maps at 10kb. P(s) curves are matched by region and color with arrows on the middle heatmap.
Fig 3
Fig 3. Compartments and eigenvectors.
a. To obtain cis compartments profiles, observed maps are first divided by expected. b. Observed/expected maps are decomposed into a sum of eigenvectors and associated eigenvalues. c. Illustration of eigenvector phasing. In mammalian Hi-C maps, the first eigenvector typically, but not always, corresponds to the compartment signal. Since eigenvectors are determined only up to a sign, their orientations are random. To obtain consistent results, the final cis compartment profile (right) is obtained as the eigenvector most correlated with a phasing track (here, GC content), and oriented to have a positive correlation.
Fig 4
Fig 4. Pairwise class averaging and saddle plots.
a. Compartment profile, where more negative values are B regions, and positive values A for a 15Mb region of chr2. b. Digitized compartment profile, quantized into 5 classes by percentile. The lowest is highlighted as a thicker line. c. Observed/expected map with pairs of B regions highlighted. d. Saddle-plot for the 5 digitized classes, highlighted regions in the observed/expected map contribute to the top left pixel boxed in grey.
Fig 5
Fig 5. Insulation and boundaries.
a. Diamond insulation is calculated as the sum in a sliding window (gray) across the genome, shown here for HFF MicroC data in a region of chr2 at 10kb resolution (chr2:10900000–11650000). b. The resulting insulation profile is shown in black. Local minima are indicated with dots. Positions of strong boundaries shown as orange dots, and filtered weak boundaries as blue dots. Two-sided gray arrow shows the boundary strength of the strong boundary at chr2:1146000–1147000, calculated relative to the maximum insulation achieved before a more prominent minima in either genomic direction. Here, strength is relative to the prominent minima at chr2:11130000–11140000, and maximum insulation is indicated with a dashed gray line.
Fig 6
Fig 6. Dots.
a. Dots calls from a region of chromosome 17, highlighted by squares on the upper triangular portion of the map. Squares show the size of the region scanned by convolutional kernels. b. illustration of convolutional kernels used for dot calling around one example, from left to right: ‘donut’, ‘top’, ‘bottom’, ‘lowerleft’. Local enrichment at the center pixel is calculated relative to the shaded regions in each kernel.
Fig 7
Fig 7. Pileups and average snippets.
a. snippets, or regions around called dots, are extracted from the genome-wide map. b. set of extracted snippets. c. average pileup for dots created by averaging the set of snippets.

References

    1. McCord RP, Kaplan N, Giorgetti L. Chromosome Conformation Capture and Beyond: Toward an Integrative View of Chromosome Structure and Function. Mol Cell. 2020;77: 688–708. doi: 10.1016/j.molcel.2019.12.021 - DOI - PMC - PubMed
    1. Dekker J, Belmont AS, Guttman M, Leshyk VO, Lis JT, Lomvardas S, et al.. The 4D nucleome project. Nature. 2017;549: 219–226. doi: 10.1038/nature23884 - DOI - PMC - PubMed
    1. ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004;306: 636–640. doi: 10.1126/science.1105136 - DOI - PubMed
    1. Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, et al.. Array programming with NumPy. Nature. 2020;585: 357–362. doi: 10.1038/s41586-020-2649-2 - DOI - PMC - PubMed
    1. Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al.. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17: 261–272. doi: 10.1038/s41592-019-0686-2 - DOI - PMC - PubMed