Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Jan;17(1):36-75.
doi: 10.1038/s41596-021-00633-y. Epub 2022 Jan 10.

SPRITE: a genome-wide method for mapping higher-order 3D interactions in the nucleus using combinatorial split-and-pool barcoding

Affiliations
Review

SPRITE: a genome-wide method for mapping higher-order 3D interactions in the nucleus using combinatorial split-and-pool barcoding

Sofia A Quinodoz et al. Nat Protoc. 2022 Jan.

Abstract

A fundamental question in gene regulation is how cell-type-specific gene expression is influenced by the packaging of DNA within the nucleus of each cell. We recently developed Split-Pool Recognition of Interactions by Tag Extension (SPRITE), which enables mapping of higher-order interactions within the nucleus. SPRITE works by cross-linking interacting DNA, RNA and protein molecules and then mapping DNA-DNA spatial arrangements through an iterative split-and-pool barcoding method. All DNA molecules within a cross-linked complex are barcoded by repeatedly splitting complexes across a 96-well plate, ligating molecules with a unique tag sequence, and pooling all complexes into a single well before repeating the tagging. Because all molecules in a cross-linked complex are covalently attached, they will sort together throughout each round of split-and-pool and will obtain the same series of SPRITE tags, which we refer to as a barcode. The DNA fragments and their associated barcodes are sequenced, and all reads sharing identical barcodes are matched to reconstruct interactions. SPRITE accurately maps pairwise DNA interactions within the nucleus and measures higher-order spatial contacts occurring among up to thousands of simultaneously interacting molecules. Here, we provide a detailed protocol for the experimental steps of SPRITE, including a video ( https://youtu.be/6SdWkBxQGlg ). Furthermore, we provide an automated computational pipeline available on GitHub that allows experimenters to seamlessly generate SPRITE interaction matrices starting with raw fastq files. The protocol takes ~5 d from cell cross-linking to high-throughput sequencing for the experimental steps and 1 d for data processing.

PubMed Disclaimer

Conflict of interest statement

Competing interests

S.A.Q. and M.G. are inventors of a patent on the SPRITE method.

Figures

Extended Data Fig. 1 |
Extended Data Fig. 1 |. DNA sizes postfragmentation by DNase for human GM12878 cells.
As with mouse ES cells (or any other cell type we have tested), for human GM12878 cells, we optimize DNAse digestion to obtain DNA sized with a range of 50–1,000 base pairs with an average size between 200 and 300 base pairs.
Fig. 1 |
Fig. 1 |. Overview of SPRITE procedure.
Day 1: (1) cells are dual cross-linked with DSG and formaldehyde (Steps 1–23); (2) cells are lysed and chromatin is fragmented using sonication and DNase digestion (Steps 24–53); (3) cross-linked complexes in lysate are coupled to NHS beads overnight (Steps 54–67). Days 2–3: (4) DNA is blunt-ended, phosphorylated and dA-tailed (Steps 68–77) prior to (5) ligation with DPM adaptor (Steps 78–94); (6) four additional rounds of split-and-pool ligations are performed with the Odd, Even, Odd and Terminal tags, which we refer to as a barcode (Steps 79–98); (7) after split-and-pool, samples are split into several aliquots and DNA is reverse cross-linked overnight by addition of ProK enzyme and heat (Steps 99–101). Day 4: (8) final SPRITE libraries are amplified (Steps 102–120). Day 5 onward: (9) DNA is sequenced (Step 121), and (10) all molecules sharing the same barcodes are matched to generate SPRITE clusters (Steps 122–129); (11) DNA interactions occurring in SPRITE clusters can be analyzed as pairwise interactions, visualized using intra- and interchromosomal heatmaps, or as multiway interactions, visualized using individual clusters (Step 130).
Fig. 2 |
Fig. 2 |. Schematic of split-pool procedure.
Split-and-pool barcoding works by splitting cross-linked complexes across a 96-well plate containing 96 unique tags, ligating a unique sequence (colored tag) to each DNA molecule, and pooling all cross-linked complexes into a single tube. This split-and-pool process is repeated over multiple rounds, sequentially adding an additional tag each round. Because all molecules within a cross-linked complex are covalently attached, they will sort across the same wells during each round of the split-and-pool process and will obtain the same series of tags, which we refer to as a SPRITE barcode. Genomic DNA fragments and their associated barcodes are then sequenced. All reads sharing the same SPRITE barcodes are matched to generate SPRITE clusters.
Fig. 3 |
Fig. 3 |. Summary of alignment statistics.
a, An example Bioanalyzer profile of a final SPRITE library after PCR amplification. b, A summary of ligation efficiency statistics is outputted as a QC step from the SPRITE pipeline to confirm tags have successfully ligated to each DNA molecule. The distribution of reads containing zero, one, two, three, four or five SPRITE tags is shown for two independent SPRITE experiments. Ligation efficiency at each round (95%) is calculated by taking the fifth root of the fraction of reads containing five tags (77.1%). c, SPRITE cluster sizes are outputted as a QC step from the SPRITE pipeline to confirm interactions have been successfully detected. Individual SPRITE clusters contain all reads sharing the same barcode. The number of reads sharing the same barcode within an individual cluster can range from 1 read per cluster (singlets; molecules not interacting with other molecules, red), 2–10 reads per cluster (purple), 11–100 reads per cluster (blue), 101–1,000 reads per cluster (dark green) to >1,000 reads per cluster (light green). The percentage of reads that correspond to different SPRITE cluster sizes is shown for two independent experiments generated on 3% FA-DSG samples sonicated for 1 min, 4–5 W as described in the procedure. Successful experiments typically detect a distribution of cluster sizes similar to those shown here. d, The number of DNA molecules in a SPRITE cluster reflects the distance at which DNA molecules are interacting in the nucleus. Specifically, smaller SPRITE clusters primarily capture close-range (within TAD) interactions, whereas larger clusters capture longer-distance interactions within A or B compartments (local and nonlocal). Intrachromosomal contacts between Hi-C and SPRITE are calculated for various SPRITE cluster sizes. e, Interchromosomal contacts were computed for different SPRITE cluster sizes, and P-values were generated to identify significant interchromosomal interactions. SPRITE clusters containing >1,000 reads per cluster are highly enriched for interchromosomal interactions that occur between chromosomes that organize around the nucleolus (12, 15, 16, 18 and 19 in mES cells), whereas clusters containing 2–10 reads per cluster are depleted for interchromosomal interactions. Panel a reproduced with permission from ref. , Elsevier; panels c–e adapted with permission from ref. , Elsevier.
Fig. 4 |
Fig. 4 |. Computational SPRITE Pipeline.
a, The SPRITE Snakemake pipeline works as follows: the prerequisites (red boxes) before starting include installing conda and Snakemake. Then, run fastq2json.py to compile the paths of all SPRITE fastq files into a .json file. Generate and index a Bowtie2 genome for genome alignments (mm10 and hg38 are currently supported). For each experiment, modify the config.yaml file to input the total number of tags corresponding to the number of ligation rounds and also the location of the reference genome index for alignment. The pipeline is then launched (gray boxes) where (1) adaptor trimming is performed, (2) barcodes are identified, (3) reads without all SPRITE tags are removed, (4) the DPM tag is trimmed from the beginning of read 1, (5) all reads are aligned to the genome using Bowtie2, (6) chromosome annotations are converted from Ensembl to UCSC format, (6) all regions that do not fall within repeat-masked or blacklisted genomic bins are retained, (7) all reads sharing the same barcodes are matched and collapsed into SPRITE clusters and (8) heatmaps are generated. Certain QC files are outputted along the way to quantify ligation efficiency and plot SPRITE cluster sizes, and all summary statistics are outputted in MulitQC. b, Because the number of pairwise contacts scales quadratically with the number of reads (n) contained within a SPRITE cluster, larger clusters will contribute a disproportionately large number of the contacts observed between any two bins. To account for this, we downweighted each of the possible pairwise contacts that can be enumerated from a single SPRITE cluster by the cluster size in which it is observed. This is achieved by downweighting each pairwise contact by 2/n. In this way, the total contribution of pairwise contacts from a cluster is proportional to the minimally connected edges in the graph and will each contain have n − 1 contacts. This also ensures that the number of pairwise contacts contributed by a cluster is linearly proportional to the number of reads within a cluster. Here, we show an example of SPRITE cluster weighting using a cluster of four reads, where the total number of contacts is six. Our SPRITE cluster weighting scheme computes all six possible contacts and downweights each contact by n/2, such that the total number of contacts sums to three.
Fig. 5 |
Fig. 5 |. SPRITE Identifies higher-order interactions that occur simultaneously.
a, Compartment eigenvector showing A (red) and B (blue) compartments on mouse chromosome 2 (top). Individual SPRITE clusters (rows) containing reads mapping to at least three distinct A compartment regions (*) (middle). Pairwise contact map at 200 kb resolution (bottom). Pink bars represent A compartment regions. b, H3K27ac chromatin immunoprecipitation sequencing (ChIP-seq, ENCODE) signal across a 2.46 Mb region on human chromosome 6 corresponding to three TADs containing 55 histone genes (top). SPRITE clusters containing reads in all three TADs (middle). Pairwise contact map at 25 kb resolution (bottom). Blue bars represent histone gene regions. c, CTCF motif orientations at three loop anchors on human chromosome 8 (top). SPRITE clusters overlapping all three loop anchors (middle). Pairwise contact map at 25 kb resolution (bottom). Green bars represent CTCF motif sites. d, Schematic of multiple A compartment interactions. e, Schematic of higher-order interactions of HIST1 genes (green). f, Schematic of higher-order interactions between consecutive loop anchors. Figure adapted with permission from ref. , Elsevier.
None
None
None
None
None
None

References

    1. Martin C et al. Genome restructuring in mouse embryos during reprogramming and early development. Dev. Biol. 10.1016/j.ydbio.2006.01.009 (2006). - DOI - PubMed
    1. Dixon JR et al. Chromatin architecture reorganization during stem cell differentiation. Nature 518, 331–336 (2015). - PMC - PubMed
    1. Bonev B & Cavalli G Organization and function of the 3D genome. Nat. Rev. Genet. 17, 772–772 (2016). - PubMed
    1. Pombo A & Dillon N Three-dimensional genome architecture: players and mechanisms. Nat. Rev. Mol. Cell Biol. 16, 245–257 (2015). - PubMed
    1. Lieberman-aiden E et al. Comprehensive mapping of long-range interactions reveals folding principles ofthe human genome. Science 326, 289–294 (2009). - PMC - PubMed

Publication types

LinkOut - more resources