. 2024 May 6;20(5):e1012067.

doi: 10.1371/journal.pcbi.1012067. eCollection 2024 May.

Cooltools: Enabling high-resolution Hi-C analysis in Python

Affiliations

¹ Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, Massachusetts, United States of America.
² Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, Massachusetts, United States of America.
³ Institute for Medical Engineering and Sciences, Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, United States of America.
⁴ Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, United States of America.
⁵ Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland.
⁶ Institute of Molecular Biotechnology of the Austrian Academy of Sciences (IMBA), Vienna BioCenter (VBC), Vienna, Austria.

PMID: 38709825
PMCID: PMC11098495
DOI: 10.1371/journal.pcbi.1012067

Cooltools: Enabling high-resolution Hi-C analysis in Python

Open2C et al. PLoS Comput Biol. 2024.

. 2024 May 6;20(5):e1012067.

doi: 10.1371/journal.pcbi.1012067. eCollection 2024 May.

Authors

Affiliations

¹ Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, Massachusetts, United States of America.
² Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, Massachusetts, United States of America.
³ Institute for Medical Engineering and Sciences, Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, United States of America.
⁴ Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, United States of America.
⁵ Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland.
⁶ Institute of Molecular Biotechnology of the Austrian Academy of Sciences (IMBA), Vienna BioCenter (VBC), Vienna, Austria.

PMID: 38709825
PMCID: PMC11098495
DOI: 10.1371/journal.pcbi.1012067

Abstract

Chromosome conformation capture (3C) technologies reveal the incredible complexity of genome organization. Maps of increasing size, depth, and resolution are now used to probe genome architecture across cell states, types, and organisms. Larger datasets add challenges at each step of computational analysis, from storage and memory constraints to researchers' time; however, analysis tools that meet these increased resource demands have not kept pace. Furthermore, existing tools offer limited support for customizing analysis for specific use cases or new biology. Here we introduce cooltools (https://github.com/open2c/cooltools), a suite of computational tools that enables flexible, scalable, and reproducible analysis of high-resolution contact frequency data. Cooltools leverages the widely-adopted cooler format which handles storage and access for high-resolution datasets. Cooltools provides a paired command line interface (CLI) and Python application programming interface (API), which respectively facilitate workflows on high-performance computing clusters and in interactive analysis environments. In short, cooltools enables the effective use of the latest and largest genome folding datasets.

Copyright: © 2024 Open2C et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Overview of cooltools functionality.**
Open2C provides a modular ecosystem of software libraries for Hi-C analysis (highlighted with gray boxes). Pairtools [58] takes in paired-end sequence alignments and extracts contact pairs in the 4DN.pairs format. cooler [8] bins these contact pairs and stores the resulting sparse matrices in.cool and.mcool formats. The nextflow pipeline distiller [59] converts sequencing reads from FASTQ files directly into binned and normalized cooler files, integrating read alignment with pairtools and cooler. The library introduced in this paper, cooltools (in bold), provides methods to quantify and extract features from high-resolution contact maps stored by cooler.

**Fig 2. Expected and contact frequency versus distance.**
a. Observed contact map for HFF Micro-C for chr2 and chr17 at 1Mb. Chromosomal arms p and q are depicted as light and dark grey rectangles respectively. Note the wide unmappable centromeric regions (white rows and columns) between chromosomal arms. Accounting for these regions is a key aspect of calculating an expected map. b. Expected map for three classes of regions: intra-chromosomal intra-arm, intra-chromosomal inter-arm, and inter-chromosomal. Regions for expected are specified using genomic views, where individual regions are chromosomal arms. Note that intra-chromosomal expected has a strongly decreasing contact frequency with genomic distance, whereas inter-chromosomal expected appears flat. c. Average contact frequency versus genomic separation, or *P(s)*, for intra-arm interactions (blue, orange) and for inter-arm interactions (green), calculated from contact maps at 10kb. *P(s)* curves are matched by region and color with arrows on the middle heatmap.

**Fig 3. Compartments and eigenvectors.**
a. To obtain cis compartments profiles, observed maps are first divided by expected. b. Observed/expected maps are decomposed into a sum of eigenvectors and associated eigenvalues. c. Illustration of eigenvector phasing. In mammalian Hi-C maps, the first eigenvector typically, but not always, corresponds to the compartment signal. Since eigenvectors are determined only up to a sign, their orientations are random. To obtain consistent results, the final cis compartment profile (right) is obtained as the eigenvector most correlated with a phasing track (here, GC content), and oriented to have a positive correlation.

**Fig 4. Pairwise class averaging and saddle plots.**
a. Compartment profile, where more negative values are B regions, and positive values A for a 15Mb region of chr2. b. Digitized compartment profile, quantized into 5 classes by percentile. The lowest is highlighted as a thicker line. c. Observed/expected map with pairs of B regions highlighted. d. Saddle-plot for the 5 digitized classes, highlighted regions in the observed/expected map contribute to the top left pixel boxed in grey.

**Fig 5. Insulation and boundaries.**
a. Diamond insulation is calculated as the sum in a sliding window (gray) across the genome, shown here for HFF MicroC data in a region of chr2 at 10kb resolution (chr2:10900000–11650000). b. The resulting insulation profile is shown in black. Local minima are indicated with dots. Positions of strong boundaries shown as orange dots, and filtered weak boundaries as blue dots. Two-sided gray arrow shows the boundary strength of the strong boundary at chr2:1146000–1147000, calculated relative to the maximum insulation achieved before a more prominent minima in either genomic direction. Here, strength is relative to the prominent minima at chr2:11130000–11140000, and maximum insulation is indicated with a dashed gray line.

**Fig 6. Dots.**
a. Dots calls from a region of chromosome 17, highlighted by squares on the upper triangular portion of the map. Squares show the size of the region scanned by convolutional kernels. b. illustration of convolutional kernels used for dot calling around one example, from left to right: ‘donut’, ‘top’, ‘bottom’, ‘lowerleft’. Local enrichment at the center pixel is calculated relative to the shaded regions in each kernel.

**Fig 7. Pileups and average snippets.**
a. snippets, or regions around called dots, are extracted from the genome-wide map. b. set of extracted snippets. c. average pileup for dots created by averaging the set of snippets.

See this image and copyright information in PMC

References

1. McCord RP, Kaplan N, Giorgetti L. Chromosome Conformation Capture and Beyond: Toward an Integrative View of Chromosome Structure and Function. Mol Cell. 2020;77: 688–708. doi: 10.1016/j.molcel.2019.12.021 - DOI - PMC - PubMed
1. Dekker J, Belmont AS, Guttman M, Leshyk VO, Lis JT, Lomvardas S, et al.. The 4D nucleome project. Nature. 2017;549: 219–226. doi: 10.1038/nature23884 - DOI - PMC - PubMed
1. ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004;306: 636–640. doi: 10.1126/science.1105136 - DOI - PubMed
1. Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, et al.. Array programming with NumPy. Nature. 2020;585: 357–362. doi: 10.1038/s41586-020-2649-2 - DOI - PMC - PubMed
1. Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al.. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17: 261–272. doi: 10.1038/s41592-019-0686-2 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Cooltools: Enabling high-resolution Hi-C analysis in Python

Affiliations

Cooltools: Enabling high-resolution Hi-C analysis in Python

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous