Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan;40(1):64-73.
doi: 10.1038/s41587-021-00998-1. Epub 2021 Aug 23.

Single-cell measurement of higher-order 3D genome organization with scSPRITE

Affiliations

Single-cell measurement of higher-order 3D genome organization with scSPRITE

Mary V Arrastia et al. Nat Biotechnol. 2022 Jan.

Abstract

Although three-dimensional (3D) genome organization is central to many aspects of nuclear function, it has been difficult to measure at the single-cell level. To address this, we developed 'single-cell split-pool recognition of interactions by tag extension' (scSPRITE). scSPRITE uses split-and-pool barcoding to tag DNA fragments in the same nucleus and their 3D spatial arrangement. Because scSPRITE measures multiway DNA contacts, it generates higher-resolution maps within an individual cell than can be achieved by proximity ligation. We applied scSPRITE to thousands of mouse embryonic stem cells and detected known genome structures, including chromosome territories, active and inactive compartments, and topologically associating domains (TADs) as well as long-range inter-chromosomal structures organized around various nuclear bodies. We observe that these structures exhibit different levels of heterogeneity across the population, with TADs representing dynamic units of genome organization across cells. We expect that scSPRITE will be a critical tool for studying genome structure within heterogeneous populations.

PubMed Disclaimer

Figures

Extended Data Fig. 1 |
Extended Data Fig. 1 |. scSPRITE generate single cell maps with high genomic coverage.
a. Quantification of cell aggregation. Top: number of cells in clumps pre- and post-filtration (singlets, doublets, triplets, etc). Bottom: microscope images (10x) of cells pre- and post-filtration step, scale bar 100 μm. b. Validation of In-nuclei barcoding step of the protocol on mixed cell population (human-mouse cells): no mixing (top middle and top right), mixing before crosslinking (bottom left), mixing after crosslinking (bottom middle), and mixing after in-nuclei restriction digest (bottom right). c. Schematic of the computational analysis pipeline for processing scSPRITE data. d. Theoretical number of contacts measured by SPRITE-derived methods and Hi-C-derived methods over increasing numbers of DNA molecules per complex. e. Maximum number of pairwise interactions that can be obtained from proximity ligation (Hi-C-derived methods) and complex barcoding (SPRITE-derived methods). f. Genome-wide coverage for the filtered 1,000 cells: the median (black triangular points) and median absolute deviation (MAD) (green circular points) values were calculated per cell using the number of reads per 1 Mb bin genome-wide (chr1–19). g. Genomic coverage of 20 random cell barcodes; 1 Mb bin per chromosome.
Extended Data Fig. 2 |
Extended Data Fig. 2 |. Known chromosomal structures can be measured genome-wide in hundreds of single mESCs by scSPRITE.
a. Additional single cell examples of chromosome territory structure between chr1 and chr2; plotted as number of DNA clusters at 1 Mb resolution. Box plot represents normalized detection scores between chr1 and chr2, where whiskers represent the 10th and 90th percentiles, box limits represent the 25th and 75th percentiles, black line represents the median, red dots represent single cell examples (n = 1000 cells). b. Chromosome territory scores across 1000 cells (clustered based on similarity pattern). Columns represent chromosome territory detection scores for all pairs of chromosomes with the reference chromosome. Arrows represent chromosome territory scores between chr1 and chr2, which were analyzed in this paper. c. Quantification of chromosome territory scores with respect to each chromosome. Boxplots show the range of chromosome territory scores, the average score (black line), and individual pairs of chromosome territory scores (grey dots). d. Box plot represents average chromosome territory detection scores from all genome-wide (chr1–19) chromosome pairs., where whiskers represent the 10th and 90th percentiles, box limits represent the 25th and 75th percentiles, black line represents the median, red dots represent single cell examples (n = 1000 cells) (left).. Additional single cell examples of genome-wide (chr1–19) chromosome territories (right). e. Additional single cell examples of A/B compartments detected within 0–55Mb in chr2; plotted number of DNA clusters at 1 Mb resolution (right). Box plot represents normalized detection scores between 0–55Mb in chr2, where whiskers represent the 10th and 90th percentiles, box limits represent the 25th and 75th percentiles, black line represents the median, red dots represent single cell examples (n = 1000 cells). f. Representation of compartment switching scores across 1,000 cells (clustered based on score similarity pattern). Columns represent the strength of compartment switching detection scores for compartments that switched from “B-to-A-to-B” or “A-to-B-to-A” genome-wide (chr1–19). Arrows represent compartment switching scores for chr2 1–55 Mb, chr8 22–37 Mb, chr10 58–70 Mb, and chr17 8–45 Mb, all of which were analyzed in this paper. g. Additional single cell examples of compartment switching from Region 1, Region 2, and Region 3 (right). For each region’s box plot: whiskers represent the 10th and 90th percentiles, box limits represent the 25th and 75th percentiles, black line represents the median, red dots represent single cell examples (n = 1000 cells). h. Expected (right) and observed (left) coverage of reads in the A and B compartment.
Extended Data Fig. 3 |
Extended Data Fig. 3 |. Higher-order structures are identified genome-wide in hundreds of single mESC by scSPRITE method.
a. Additional single cell examples of nucleolar interactions detected between chr18 and chr19; plotted number of DNA clusters at 1 Mb resolution; detection scores below contact map (right). Box plot represents normalized detection scores between chr18 and chr19, where whiskers represent the 10th and 90th percentiles, box limits represent the 25th and 75th percentiles, black line represents the median, red dots represent single cell examples (n = 1000 cells). b. Nucleolar interaction between chr12 and chr19: detection scores for 1000 cells (middle). Box plot where whiskers represent the 10th and 90th percentiles, box limits represent the 25th and 75th percentiles, black line represents the median, red dots represent single cell examples (n = 1000 cells). Representation of structures with max score (+1) and min. score (−1) (left) and ensemble scSPRITE heatmap (middle); contact map at 1 Mb resolution. Single cell examples (right); plotted number of DNA clusters at 1 Mb resolution. c. Relative correlation of the percent of cells from scSPRITE vs DNA-FISH containing inter-chromosomal interactions at specified 1 Mb regions targeted by DNA-FISH probes. Control chromosomes (grey points) and nucleolar associating chromosomes (black dots) are plotted. d. Relative correlation of the contact frequency from scSPRITE vs the contact frequency from SPRITE containing inter-chromosomal interactions targeted by DNA-FISH probes. Control chromosomes (grey points) and nucleolar associating chromosomes (black dots) are plotted. e. Frequency of cells containing inter-chromosomal nucleolar contacts (normalized to number of reads per region) for each pair of nucleolar associating chromosomes.. f. Single cell examples of speckle interaction detected between chr2 and chr5; plotted number of DNA clusters at 1 Mb resolution. Box plot represents normalized detection scores between chr2 and chr5, where whiskers represent the 10th and 90th percentiles, box limits represent the 25th and 75th percentiles, black line represents the median, red dots represent single cell examples (n = 1000 cells). g. Additional single cell examples of speckle interactions detected between chr2 and chr4; plotted number of DNA clusters at 1 Mb resolution. Box plot represents normalized detection scores between chr2 and chr4, where whiskers represent the 10th and 90th percentiles, box limits represent the 25th and 75th percentiles, black line represents the median, red dots represent single cell examples (n = 1000 cells). h. Frequency of cells containing inter-chromosomal speckle contacts (normalized to number of reads per region) for each pair of speckle associating chromosomes. i. Additional single cell examples of centromere-proximal interactions detected between chr1 and chr11; plotted number of DNA clusters at 1 Mb resolution. Box plot represents normalized detection scores between chr1 and chr11, where whiskers represent the 10th and 90th percentiles, box limits represent the 25th and 75th percentiles, black line represents the median, red dots represent single cell examples (n = 1000 cells). j. Single cell examples of chr4 and chr11 centromere-proximal regions interacting together; plotted number of DNA clusters at 1 Mb resolution. Box plot represents normalized detection scores between chr4 and chr11, where whiskers represent the 10th and 90th percentiles, box limits represent the 25th and 75th percentiles, black line represents the median, red dots represent single cell examples (n = 1000 cells). k. Frequency of cells containing inter-chromosomal centromeric contacts (normalized to number of reads per region) for each pair of chromosomes. l. Higher-order structures representation from scHi-C data – centromere-proximal interactions, speckle interactions, and nucleolar interactions; Pairwise contact map from ensemble 1,000 cells (left), pairwise contact map from their best single cell (right).
Extended Data Fig. 4 |
Extended Data Fig. 4 |. TADs are heterogeneous units present in the genomes of individual mESCs.
a. Genome-wide correlation of insulation scores between ensemble scSPRITE and Hi-C from mouse ES cells at 40 kb resolution. b. Insulation score profile of ensemble scSPRITE (red) and Hi-C (blue) at 40 kb resolution at chr1 65–95 Mb. c. Additional single cell examples of TAD-like structures between 124.8–126.7Mb of chr4; plotted number of DNA clusters at 40 kb resolution; detection scores below contact map. Box plot represents normalized detection scores between 124.8–126.7Mb of chr4, where whiskers represent the 10th and 90th percentiles, box limits represent the 25th and 75th percentiles, black line represents the median, red dots represent single cell examples (n = 1000 cells). d. TAD detection scores across 1,000 cells (clustered based on score similarity pattern) in chr2 (left) and chr18 (right). Columns represent the strength of TAD detection scores for all TADs detected across chr2 or chr18, respectively, in ensemble scSPRITE. e. TAD detection scores across 1,000 cells between 38.5–48.56 Mb of chr4. Each line represents the strength of TAD detection scores in this given region from a single cell. Cells are either in Group 1 or 2 in Fig. 4f or not used. f. Ensemble heatmap from all 1000 cells between 39.4–41.4Mb of chr4 representing strong TADs detected in bulk (blue lines), and weak emerging TADs (green line) over the A/B boundary. g. Fraction of cells in each cell cycle phase from the set of single cells containing (left) or lacking (right) the contact between the boundary region (Fig. 4f). h. Difference contact map across a control region 84.8–88.4 Mb of chr4 made by subtracting the normalized contacts from cells in Group II from Group I (Fig. 4f). Insulation scores for cells in Group I (dark grey) and Group II (light grey) are plotted.
Extended Data Fig. 5 |
Extended Data Fig. 5 |. Structural heterogeneity in long-range interactions is revealed by scSPRITE.
a. Ensemble heatmaps across 122.2–122.8 Mb region in chr6 representing cells containing (top) or lacking (bottom) the contact between the Nanog locus and the −300 Kb SE. Blue square shows the contact. b. Number of genome-wide reads (left) and number of genome-wide contacts (right) for groups of cells with and without the Nanog-SE interaction. For each box plot, whiskers represent the 10th and 90th percentiles, box limits represent the 25th and 75th percentiles, black line represents the median (with = 159 cells, without = 149 cells). No statistical significance between the two groups were seen based on the Kolmogorov–Smirnov two-sided test. c. Fraction of cells in each cell cycle phase from the set of single cells containing (left) or lacking (right) the contact between the Nanog locus and the SE 300kb upstream of Nanog. d. Heatmaps between 119.24–121.28Mb in chr5 of pooled cells either containing (top) or lacking (bottom) the contact between the Tbx3 locus and Lhx5. Blue square shows the contact. e. Number of genome-wide reads (left) and number of genome-wide contacts (right) for groups of cells with and without the Tbx3-Lhx5 interaction. For each box plot, whiskers represent the 10th and 90th percentiles, box limits represent the 25th and 75th percentiles, black line represents the median (with = 152 cells, without = 149 cells). No statistical significance between the two groups were seen based on the Kolmogorov–Smirnov two-sided test. f. Fraction of cells in each cell cycle phase from the set of single cells containing (left) or lacking (right) the contact between the Tbx3 locus and the Lhx5.
Fig. 1 |
Fig. 1 |. scSPRITE—a single-cell method to map DNA structure genome-wide.
a, Schematic of scSPRITE protocol. b, Validation of in-nuclei barcoding step on mixed cell population (human–mouse cells); the number of reads for each identified cell barcode ID is plotted. Threshold of >95% single-species reads was applied to identify mouse- or human-only cells; cell barcodes >1,000 reads are plotted. c, Number of contacts (blue), reads (red) and DNA clusters (gray) plotted for the 1,500 cells. Dashed lines represent filtration steps: left of the dashed lines—cell aggregates estimated based on detected collision rate from Fig. 1b; right of the dashed lines—cells with low number of reads/contacts d, Comparison of merged scSPRITE (upper diagonal, ‘ensemble scSPRITE’) and bulk SPRITE (lower diagonal). Chromosome territories across all chromosomes at 1-Mb resolution (left); A/B compartments on chromosome 2 at 200-kb resolution (middle); TADs within an 18-Mb region of chromosome 6 at 40-kb resolution (right). e, Schematic illustration of multiway interactions (SPRITE-derived methods) and pairwise interactions (proximity ligation methods) and examples of heat maps. f, Number of contacts (top) and number of reads (bottom) obtained from scSPRITE (blue) and scHi-C (gray). g, Genomic coverage per 1-Mb, 100-kb, 40-kb and 10-kb bins in individual 1,000 cells. h, Average number of reads per single cell in 1-Mb bins of each chromosome (n = 1,000 cells). Average (dots) and s.d. (bars) are shown; asterisk marks chromosome with detected trisomy.
Fig. 2 |
Fig. 2 |. scSPRITE accurately measures single-cell DNA interactions at different resolutions by capturing multiway interactions.
a, Illustration of chromosome territories for chr1 and chr 2 (left) and ensemble scSPRITE heat map (right) of the same structures; downweighted contact map at 1-Mb resolution. b, Chromosome territory normalized detection scores for 1,000 individual cells between chr1 and chr2. Left: representation of structures with max. score (+1) and min. score (−1). Center: box plot where whiskers represent the 10th and 90th percentiles; box limits represent the 25th and 75th percentiles; the black line represents the median; red dots represent single-cell examples (n = 1,000 cells). Right: single-cell examples of chr1 and chr2 territories, plotted as number of DNA clusters at 1-Mb resolution. c, Normalized detection scores across all 1,000 cells per each pair of chromosome territories detected in ensemble scSPRITE data; score = 0 (red line). d, Normalized detection scores across all pairs of chromosome territories detected in ensemble scSPRITE data per single cell; score = 0 (red line). e, Chromosome territories (chr1–19) in ensemble scSPRITE (left) and in a single cell (right, detection score = 0.25). f, Illustration of A/B compartment in chr2:0–55 Mb (left) and ensemble scSPRITE heat map (right); downweighted contact map at 1-Mb resolution. g, A/B compartments detection scores for 1,000 individual cells. Left: representation of structures with max score (+1) and min. score (−1). Center: box plot where whiskers represent the 10th and 90th percentiles; box limits represent the 25th and 75th percentiles; the black line represents the median; red dots represent single-cell examples (n = 1,000 cells). Right: single-cell examples of A/B compartments in chr2:0–55 Mb, plotted as number of DNA clusters at 1-Mb resolution. h, Normalized detection scores across all 1,000 cells per each compartment switch; score = 0 (red line). i, Compartment detection scores across all compartments per single cell; score = 0 (red line). j, Examples of three different regions containing a high (Region 1), medium (Region 2) and low (Region 3) median compartment switch score. For each region’s box plot: whiskers represent the 10th and 90th percentiles; box limits represent the 25th and 75th percentiles; the black line represents the median; red dots represent single-cell examples (n = 1,000 cells). Heat maps for each region are shown in both ensemble scSPRITE (above) and single cell (below).
Fig. 3 |
Fig. 3 |. scSPRITE identifies inter-chromosomal structures genome-wide in hundreds of single mESCs.
a, Quantification of inter-chromosomal contacts from the top 1,000 cells by scHi-C (gray) and scSPRITE (blue). The dashed lines represent the mean percentage of inter-chromosomal contacts. b, Nucleolar interaction between chr18 and chr19: illustration (left) and ensemble scSPRITE heat map (right); contact map at 1-Mb resolution. c, Nucleolar interaction detection scores for 1,000 cells (middle). Box plot where whiskers represent the 10th and 90th percentiles; box limits represent the 25th and 75th percentiles; the black line represents the median; red dots represent single-cell examples (n = 1,000 cells). Representation of structures with max score (+1) and min. score (−1) (left). Single-cell examples (right), plotted as number of DNA clusters at 1-Mb resolution. d, Frequency of NOR (blue), speckle (red) and PCH (green) higher-order interactions in comparison to randomly shuffled regions of the same size (gray) in 1,000 individual cells. e, Speckle interaction between chr2 and chr4: illustration (left) and ensemble scSPRITE heat map (right); contact map at 1-Mb resolution. f, Speckle interaction detection scores for 1,000 individual cells (middle). Box plot where whiskers represent the 10th and 90th percentiles; box limits represent the 25th and 75th percentiles; the black line represents the median; red dots represent single-cell examples (n = 1,000 cells). Representation of structures with max score (+1) and min. score (−1) (left). Single-cell examples (right), plotted as number of DNA clusters at 1-Mb resolution. g, PCH interactions between chr1 and chr11: illustrations (left) and ensemble scSPRITE heat map (right); contact map at 1-Mb resolution. h, PCH region detection scores for 1,000 individual cells (middle). Box plot where whiskers represent the 10th and 90th percentiles; box limits represent the 25th and 75th percentiles; the black line represents the median; red dots represent single-cell examples (n = 1,000 cells). Representation of structures with max score (+1) and min. score (−1) (left). Single-cell examples (right), plotted as number of DNA clusters at 1-Mb resolution. i, Mean interaction value of inter-chromosomal PCH contacts (normalized to number of reads per region) for each pair of chromosomes. NOR-containing chromosomes are shown in bold.
Fig. 4 |
Fig. 4 |. TADs are heterogeneous units present in the genomes of individual mESCs.
a, TAD structure between 124.8 Mb and 126.7 Mb of chr4: illustration (left) and scSPRITE heat map (right); pairwise contact map at 40-kb resolution. b, TAD detection scores for 1,000 cells (middle). Box plot where whiskers represent the 10th and 90th percentiles; box limits represent the 25th and 75th percentiles; the black line represents the median; red dots represent single-cell examples (n = 1,000 cells). Representation of structures with max score (+1) and min. score (−1). Single-cell examples (right), plotted as number of DNA clusters at 40-kb resolution. c, Normalized detection scores across all 1,000 cells per each TAD detected in ensemble scSPRITE data; red line marks score = 0. d, TAD detection scores across all TADs detected in ensemble scSPRITE data per single cell; red line marks score = 0. e, TAD detection scores across 1,000 cells (clustered based on score similarity pattern): columns represent the strength of TAD detection scores for all TADs detected across chr4 in ensemble scSPRITE; gray bar indicates the variable region described in Extended Data Fig. 4c. f, Ensemble heat maps across the 39.4–41.4-Mb region of chr4 representing cells containing (Group 1, top) or lacking (Group 2, bottom) the contact emerging over the boundary of A/B compartment. g, Difference contact map across 39.4–41.4 Mb of chr4 made by subtracting the normalized contacts in Group 2 from Group 1 (Fig. 4f). Insulation scores for cells in Group 1 (purple) and Group 2 (green) are plotted.
Fig. 5 |
Fig. 5 |. Heterogeneous structural states formed by Nanog and Tbx3 loci in individual mESCs.
a, Representation of the Nanog locus and its DNA interactions with SEs: 122.2–122.8-Mb region in chr6 with corresponding ChIP-seq tracks for H3K27ac and H3K4me3; Nanog–SE interaction (black lines). b, Representation of Tbx3 locus and its DNA interactions with Lhx5: 120.0–121.0-Mb region in chr5 with the corresponding ChIP-seq tracks for H3K27me3 and H3K4me3; Tbx3–Lhx5 interaction (black line). c, Normalized contact frequency plot between Nanog locus and 122.2–122.8-Mb surrounding region in chr6. Shown are cells containing (red) or lacking (blue) the contact between the Nanog locus and SE −300 kb. Each position refers to a 40-kb bin. Asterisks denote statistical significance (P < 0.0001, unpaired two-sided t-test with Welch’s correction) between the two groups at the specified positions (n = 1,000 random bootstrap groups for each of the two groups). Error bars represent 1 s.d. d, Normalized contact frequency plot between Tbx3 locus and 120.0–121.0-Mb surrounding region in chr5. Shown are cells containing (red) or lacking (blue) the contact between the Tbx3 locus and Lhx5. Each position refers to a 40-kb bin. Asterisks denote statistical significance (P < 0.0001, unpaired two-sided t-test with Welch’s correction) between the two groups at the specified positions (n = 1,000 random bootstrap groups for each of the two groups). Error bars represent 1 s.d. e, Schematic illustrating differences in structure when a gene of interest lacks (left) or contains (right) the long-range enhancer interaction. ChIP-seq, chromatin immunoprecipitation with sequencing.

Comment in

References

    1. Lieberman-Aiden E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289 (2009). - PMC - PubMed
    1. Nora EP et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485, 381–385 (2012). - PMC - PubMed
    1. Dixon JR et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012). - PMC - PubMed
    1. Dekker J. & Mirny L. The 3D genome as moderator of chromosomal communication. Cell 164, 1110–1121 (2016). - PMC - PubMed
    1. Freire-Pritchett P. et al. Global reorganisation of cis-regulatory units upon lineage commitment of human embryonic stem cells. eLife 6, e21926 (2017). - PMC - PubMed

Publication types