Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep 1;41(9):btaf510.
doi: 10.1093/bioinformatics/btaf510.

AQuA Tools: clear and reliable BEDPE operations for 3D genomics

Affiliations

AQuA Tools: clear and reliable BEDPE operations for 3D genomics

Maharshi Chakraborty et al. Bioinformatics. .

Abstract

Motivation: The genome interacts with itself within the volume of the cell nucleus to process information. These interactions mediate signal integration, gene regulation, and cell identity. The identification of new therapeutic targets from non-coding disease-associated variants relies critically on correctly assigning variants to genes through 3D interactions. Experimental techniques in 3D genomics, such as HiC and HiChIP, allow the mapping of interactions through sequencing. Bioinformatics for 3D genomics contends primarily with contact matrices that contain interaction frequencies for all possible element pairs, and BEDPE files that store element pairs that interact. Whereas the tools available for processing linear genomic data are mature, operating on contact matrices and BEDPE files remains cumbersome, opaque, and error-prone, as researchers have had to shoehorn tools originally designed for linear data. A genome arithmetic designed from the ground up for 3D genomics does not yet exist.

Results: We present AQuA Tools, a suite of shell- and R-based command-line tools that provide a set of core operations on contact matrices and BEDPE files motivated by key questions in population genetics, cancer research, and precision medicine. We have designed our core operations to be clear, reliable, intuitive and versatile. Core operations can be chained together along with standard UNIX commands. Our goal is to make AQuA Tools easy for the novice to learn and the go-to choice for power users. We hope our tools will motivate more researchers to use 3D genomic data in their projects.

Availability and implementation: We provide and maintain AQuA Tools at https://github.com/axiotl/aqua-tools.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(a) Contact plot showing interactions between genomic loci in chr2:64–65 Mb in GM12878 cell-line using HiC and (b) interactions in the same cell-line and region using H3K27ac HiChIP. The selective enrichment for interactions bound by a factor of interest in HiChIP results in more visible 3D structure. (c) Direct comparison (subtraction) between contract matrices of DMSO and entinostat treated RH4 cells in chr11:17.6–17.8 Mb, normalized using the CPM metric showing overall loss of contacts for H3K27ac HiChIP, (d) however when reference normalized using spike-in mouse genome reveals a global increase of contacts surrounding a single lost loop.
Figure 2.
Figure 2.
(a) The build_bedpe tool allows users to move from 1D to 2D genomic space. The simplest transformation is to “pop out” all pairwise combinations of elements from a single 1D BED file to a 2D BEDPE file. (b) Frequently we are interested in pairwise interactions between elements with distinct characteristics, in this example enhancers and promoters. Cross building using build_bedpe between gene TSSs and regulatory elements is the most common first step in 3D genomic analyses. (c) Computing all pairwise interactions between elements genome-wide can quickly overwhelm downstream analyses. Restricting the build operation by distance or by using a third BED file specifying the intervals in which to build focuses resources on biologically relevant interactions and lightens the computational load.
Figure 3.
Figure 3.
Against a contact matrix, extract_bedpe takes a range or TAD file as input and identifies 3D structures by transforming each bin value, irrespective of its distance to the diagonal, into a 0–1 numeric space. This mitigates the distance-dependent interaction decay typical of HiC/HiChIP and allows for single value thresholding and binarization of bins across the matrix. extract_bedpe can report (a) flare crossings, which are contact bins where long projections of HiChIP signal that emanate from the diagonal cross paths. (b) single bin loops, entries in the matrix that exceed a user supplied threshold (c) globbed single bin loops, that agglomerates individual bins that exceed the threshold into larger BEDPE structure. The radius parameter determines the maximum number bins between conjoined structures above the threshold.
Figure 4.
Figure 4.
query_bedpe takes a contact matrix and a BEDPE as inputs. Using default parameters it appends an additional column to the BEDPE with the numerical values found in the contact matrix. (a). In this example it returns the maximum value within each BEDPE entry, which is the single bin contact value with the highest read count. (b) Through parameter—formula, the tool can provide other numeric values for each BEDPE entry, such as the contact value of the most central bin, sum of all contact values or the average. The tool can also modify the BEDPE entries based on the values in the contact matrix. Parameter—fix FALSE will return BEDPE entries corresponding to the supplied formula (max or center). (c) Similarly, the tool can search for max values by expanding the size of the input BEDPE entries in bin units.
Figure 5.
Figure 5.
AQuA tools provide the basic operations of intersection and union in one and two dimensions. (a) A BEDPE file in 2D (blue) is intersected with two BED files in 1D (red and yellow). (b) Two BEDPE files are processed to obtain the union of all regions. The genomic union of overlapping coordinates spanning multiple rows and files into a single row in the output BEDPE creates areas that are greater than the geometric union.
Figure 6.
Figure 6.
The cluster_bedpe tool identifies clusters of interconnected 1D elements based on their 2D pairwise interactions. (a) Clusters of elements can be fully connected, with every possible pairwise interaction showing evidence of a chromatin loop, partially connected, with any subset of pairwise elements supported by loops (left), and minimally connected or “daisy-chained” (right). (b) Clusters are defined purely on the graph structure of nodes (1D elements) and edges (2D loops). Two clusters can be functionally disjoint and yet interleaved across the 1D genome (left). Similarly, the range of one cluster can be fully contained within the range of another (right). (c) We define a pairwise interaction between two elements as our minimal cluster. HiChIP data contains a broad range of cluster complexity. However, the majority of clusters tend to present as with non-overlapping ranges (right).
Figure 7.
Figure 7.
Side-by-side code examples demonstrate the difference in implementation complexity between standard multi-step differential loop calling pipelines and a streamlined AQuA Tools workflow. A standard example workflow (left) using common practice algorithms requires integrating multiple tools across different environments and programming languages, while chaining of select AQuA tools (right) can perform similar analyses in compact lines of code.

References

    1. Durand NC, Shamim MS, Machol I et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst 2016;3:95–8. - PMC - PubMed
    1. Ewels PA, Peltzer A, Fillinger S et al. The Nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol 2020;38:276–8. - PubMed
    1. Gryder BE, Khan J, Stanton BZ. Measurement of differential chromatin interactions with absolute quantification of architecture (AQuA-HiChIP). Nat Protoc 2020;15:1209–36. - PMC - PubMed
    1. Kruse KAI, , HugCB, , Vaquerizas JM. FAN-C: A feature-rich framework for the analysis and visualisation of chromosome conformation capture data. Genome Biol 2020;21:303. 10.1186/s13059-020-02215-9 - DOI - PMC - PubMed
    1. Mumbach MR, Rubin AJ, Flynn RA et al. HiChIP: efficient and sensitive analysis of protein-directed genome architecture. Nat Methods 2016;13:919–22. - PMC - PubMed