Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug;53(8):1166-1176.
doi: 10.1038/s41588-021-00900-4. Epub 2021 Jul 29.

Direct characterization of cis-regulatory elements and functional dissection of complex genetic associations using HCR-FlowFISH

Affiliations

Direct characterization of cis-regulatory elements and functional dissection of complex genetic associations using HCR-FlowFISH

Steven K Reilly et al. Nat Genet. 2021 Aug.

Erratum in

Abstract

Effective interpretation of genome function and genetic variation requires a shift from epigenetic mapping of cis-regulatory elements (CREs) to characterization of endogenous function. We developed hybridization chain reaction fluorescence in situ hybridization coupled with flow cytometry (HCR-FlowFISH), a broadly applicable approach to characterize CRISPR-perturbed CREs via accurate quantification of native transcripts, alongside CRISPR activity screen analysis (CASA), a hierarchical Bayesian model to quantify CRE activity. Across >325,000 perturbations, we provide evidence that CREs can regulate multiple genes, skip over the nearest gene and display activating and/or silencing effects. At the cholesterol-level-associated FADS locus, we combine endogenous screens with reporter assays to exhaustively characterize multiple genome-wide association signals, functionally nominate causal variants and, importantly, identify their target genes.

PubMed Disclaimer

Conflict of interest statement

Competing Interests

The remaining authors declare no competing interests.

Figures

Extended Data Fig. 1
Extended Data Fig. 1. CRISPRi induction, sorting schema, and construction of CASA (CRISPR Activity Screen Analysis), a generative model of CRE activity.
a, Induction of CRISPRi linked to BFP via doxycycline shows robust activation. b, Example sorting strategy showing detection of a target transcript (GATA1) amplified with Alexa-647 conjugated hairpins, and a housekeeping transcript (TBP) amplified with Alexa-488 conjugated hairpins. The top and bottom 10% of 647:488 normalized ratio are differentially sorted. c, The generative process underlying CASA (CRISPR Activity Screen Analysis) described as a plate model, explicit statistical parameterization, and variable definitions. Shaded and unshaded circles indicate observed and latent variables, respectively. The variable W corresponds to the set of windows tested, while each Nw arises from the set of gRNAs considered at the wth window.
Extended Data Fig. 2
Extended Data Fig. 2. HCR-FlowFISH screens display high similarity and increased sensitivity compared to growth screens at the GATA1 locus.
a, Overlap of the GATA1 guide library used in this study and Fulco et al. library. b, High correlation (Pearson r = 0.84, two-sided t-test P = 3.4 × 10−106) between individual guide scores for detected gRNAs shared in the GATA1 HCR-FlowFISH screen and the Fulco et al. growth screen (black line is the ordinary least squares regression best fit, gray shaded band is 95% confidence interval). c, Guide-wise score comparison for all gRNAs shared between growth and HCR-FlowFISH screens, showing read-depth of gRNA drives correlation more than off-target effects (cutting specificity). d, Individual gRNA guide scores plotted at the GATA1 promoter locus display the opposite direction CREs for GATA1 and HDAC6. e, Comparison of individual guide scores for guides shared between the HCR-FlowFISH and Fulco et al. growth screens. The distributions scores within CREs are more distinctly separated from those without when using HCR-FlowFISH. The minima, centers, and maxima of the boxes indicate the 25th, 50th, and 75th percentiles of the data distributions. Whiskers capture all remaining data, excluding outliers extending beyond 1.5 times the interquartile range below or above the 25th or 75th percentiles, respectively. n = 906 (grey boxes) and n = 313 (green boxes) shared guides analyzed outside and inside CRE boundaries, respectively.
Extended Data Fig. 3
Extended Data Fig. 3. HCR-FlowFISH and CASA enhance selectivity of CRISPRi screens at the GATA1 locus.
a, HCR-FlowFISH and PrimeFlow-CRISPRi individual guide score comparison for shared guides. Guides are grouped by overlap with CASA-nominated CREs. We find using HCR-FlowFISH improves separability between guide scores inside and outside of designated CREs compared to PrimeFlow. We also note guide score variability is reduced in HCR-FlowFISH. The minima, centers, and maxima of the boxes indicate the 25th, 50th, and 75th percentiles of the data distributions. Whiskers capture all remaining data, excluding outliers extending beyond 1.5 times the interquartile range below or above the 25th or 75th percentiles, respectively. n = 2,897 (grey boxes) and n = 88 (green boxes) shared guides analyzed outside and inside CRE boundaries, respectively. b, CASA CRE identification on simplified ABC data and comparison to HCR data. CASA only considers the highest and lowest expression bins from the first PCR replicate of each CRISPRi-FlowFISH screen replicate, yet distinguishes CREs from non-specific scores induced by perturbing the GATA1 gene body, in contrast to the original analysis.
Extended Data Fig. 4
Extended Data Fig. 4. HCR-FlowFISH and CASA identify CREs for multiple loci.
a,b, Connectogram diagrams showing K562 DHS (light blue), K562 H3K27ac (dark blue), guide coverage (black), HCR-FlowFISH composite guide score tracks, and CASA CREs calls for MYC (teal), PVT1 (salmon), LMO2 (orange), CAPRIN1 (navy), and CAT (lilac). CASA-derived CRE activity scores are shown as lines connecting the CRE to the target gene, and colored by effect on transcript abundance (black decreases abundance, red increases abundance). In a, ‘Pro’ and ‘e1–4’ denote the promoter and enhancers identified at this locus in Fulco et al.17. In b, ‘P’, ‘I’, ‘D’, denote the proximal, intermediate and distal promoters of LMO2, respectively. c, Relative mRNA expression compared to unperturbed cells for CRISPRi perturbations of distal, intermediate + distal, and proximal + distal promoters. Three technical replicates shown, bars represent standard deviation.
Extended Data Fig. 5
Extended Data Fig. 5. HCR-FlowFISH and CASA reveal complex CRE sharing at the FADS locus.
a,b, Individual guide scores (points) and CASA CRE calls (bars) of HCR-FlowFISH screens for FADS1 (green), FADS2 (teal), and FADS3 (orange). K562 DHS (light blue) and H3K27ac (dark blue) peaks are also shown. Notably, these elements are shared between all three FADS genes. Surprisingly, perturbing the CRE in a results in a modest, but detectable, increase in FADS3 transcripts, in contrast to the decreases in FADS1 and FADS2 transcript abundance.
Extended Data Fig. 6
Extended Data Fig. 6. Functional characterization nominates rs174466 as a FADS3 CRE-activity altering SNP.
a, Genomic region surrounding the FADS3 promoter, highlighting tiling MPRA signal (red) and HCR-FlowFISH composite score for FADS3 (orange). rs174466 is denoted, along with all variants in linkage disequilibrium (r2 ≥ 0.2). Variants within an HCR-FlowFISH identified FADS3 CRE are labeled in orange, and variants displaying allelic skew from MPRA are denoted with a red outline. SP2 ChIP-seq signal overlapping rs174466 is included in grey. b, GWAS trait associations with rs174466 shows multiple overlaps with metabolic targets of FADS3. c, MPRA activity for reference and alternate version of the rs174466 shows increased CRE activity on the alternate allele. d, Motif for SP2 highlighting change to alternate allele better matches the canonical motif.
Figure 1 |
Figure 1 |. HCR-FlowFISH is a new generalizable method for transcription abundance readouts in non-coding CRISPRi screens.
a, Overview of HCR-FlowFISH method showing CRE identification using endogenous CRISPRi perturbation of the genome, quantification of transcript abundance with HCR, and flow-cytometry assisted sorting to bin effector gRNAs. b, Timeline of HCR-FlowFISH protocol shows shortened, 2-day protocol. c, Detection of 23 transcripts and background (EGFP, non-expressed) via HCR. d, Probe-number normalized HCR signal correlates with gene expression levels in K562 cells, showcasing utility of HCR across a broad range of genes (R2 = 0.7731 on log10 probe-number normalized HCR signal). e, Tuning of HCR signal:background ratio by increasing probe number, probe concentration, or hours of amplification increases signal to background ratio. f, Detection of TUB1B and ACTB across six suspension and adherent mammalian cell lines displays wide-applicability of HCR-FlowFISH. g, HCR signal to background ratio does not diminish for 21 days.
Figure 2 |
Figure 2 |. HCR-FlowFISH CRE screens on transcript abundance recapitulates growth screens at the GATA1 locus and can be extended to the HDAC6 transcript.
An HCR-FlowFISH CRISPRi screen on a 920-kb region centered on GATA1. Black boxes show regions targeted by guides, K562 DHS shown in light blue, H3K27ac in dark blue; composite guide scores for GATA1 (red) and HDAC6 (yellow) are shown. b, Zoom on a 40-kb region showing that CREs identified by growth screens at this locus, eGATAI and eHDAC6, are also identified by HCR-FlowFISH, as well as the respective promoters for GATA1 and HDAC6. Composite score tracks are averaged, overlapping guide scores. c, HCR-FlowFISH guide scores for gRNAs in eGATAI (n = 85), eHDAC6 (n = 115), and eGLOD5 (n = 25), compared to randomly permuted gRNAs. eGATA1 and eHDAC6 are identified by CASA (two-sided t-test, *P ≤ 1 × 10−5, ns = not significant) by CASA. The minima, centers, and maxima of the boxes indicate the 25th, 50th, and 75th percentiles of the data distributions. Whiskers capture all remaining data, excluding outliers extending beyond 1.5 times the interquartile range below or above the 25th or 75th percentiles, respectively. d, Comparison of HCR-FlowFISH composite guide scores compared to growth scores, both binned in 10-bp regions, showing high correlation for regions in CREs identified by CASA (Spearman ρ = 0.79, two-sided t-test P = 2 × 10−54, dark red line is the ordinary least squares regression best fit, red shaded band is 95% confidence interval). e, Comparison of HCR-FlowFISH guide scores for gRNAs targeting the GATA1 transcript (red) or HDAC6 transcript (yellow) at promoter regions 1,000 bp upstream of the TSS. Guides at the GATA1 (n = 46) and HDAC6 (n = 88) promoter were high scoring when HCR was performed with probes for that promoter’s transcript and yielded significant CASA CREs, but not at nearby genes TIMM17B (n = 79) and PIM2 (n = 89) (two-sided t-test, *P ≤ 1 × 10−5, ns = not significant). The minima, centers, and maxima of the boxes indicate the 25th, 50th, and 75th percentiles of the data distributions. Whiskers capture all remaining data, excluding outliers extending beyond 1.5 times the interquartile range below or above the 25th or 75th percentiles, respectively.
Figure 3 |
Figure 3 |. Application of HCR-FlowFISH unveils gene-specific CRE interactions at diverse loci.
a-d, Connectogram diagrams showing K562 DHS (light blue), K562 H3K27ac (dark blue), guide coverage (black), HCR-FlowFISH composite guide score tracks, and CASA CREs for ERP29 (yellow), CD164 (purple), NMU (green), and MEF2C (pink). CASA-derived CRE activity scores are shown as lines connecting the CRE to the target gene, and colored by effect on transcript abundance (black decreases abundance, red increases abundance). In each case, CASA identified CREs supported by K562 H3K27ac and DHS. Stars on MEF2C and CD164 guide score tracks indicate the locations of two variants of interest.
Figure 4 |
Figure 4 |. HCR-FlowFISH uncovers a complex regulatory landscape of all genes at the FADS locus.
a, ~100-kb genomic interval surrounding the FADS locus displaying ENCODE DHS (light blue) and H3K27ac (dark blue) from K562 cells, along with the guide coverage tiled by HCR FlowFISH (black). HCR-FlowFISH composite guide scores for one replicate of FADS1, FADS2, FADS3, and FEN1 are shown. Tiling MPRA data for the same locus is included below in red. Individual gRNA binding locations for panel d are noted. b, Autocorrelation plot of adjacent guides on FADS1 HCR-FlowFISH shows significant correlation of nearby guides. c, Guide-wise logit scores for guides in the gene promoters show significantly high scoring at promoters compared to an equal number of permuted guides *two-sided Mann-Whitney-Wilcoxon test P ≤ 1 × 10−10). d, qPCR analysis of FADS1 (n = 3 primers, each three technical replicates) and FADS3 (n = 4 primers, each three technical replicates) expression changes (bars represent standard deviation) after single-guide targetings, and a non-targeting guide (NT) corroborates transcript abundance patterns in full screen.
Figure 5 |
Figure 5 |. High-resolution mapping using a CRISPR cutting HCR-FlowFISH screen identifies CREs at transcription factor resolution.
a, Clustering of high-quality single gRNA-targeted deletion sequences (purple bars) within a 3-kb CRE initially identified by the FADS3 CRISPRi screen. Deletion-bearing cells were subject to HCR-FlowFISH and sorting based on FADS3 transcript abundance. b, Individual guide scores (orange) from FADS3 HCR-FlowFISH screen, overlaid with the log odds ratio of the cumulative deletion frequencies per nucleotide in the low versus high FADS3 transcript abundance bins (purple). DHS (light blue) and H3K27ac (dark blue) K562 peaks calls are shown, along with CASA CRISPR cutting CRE in purple identifying a core 500-bp CRE. c, Zoomed view of the CRISPR cutting CRE identifies underlying ChIP peaks for multiple transcription factors (black bars) and canonical TF motifs (inlaid green bars). d, Luciferase reporter assays CREs with scrambled REST (R, red), CTCF (C, blue), NRF1 (N, pink), and all combinations. REST scrambles increased reporter expression (n = 6 each, *two-sided t-test P < 0.0001), except when paired with CTCF scrambles, which removed all CRE activity in all contexts. The minima, centers, and maxima of the boxes indicate the 25th, 50th, and 75th percentiles of the data distributions. Whiskers capture all remaining data, excluding outliers extending beyond 1.5 times the interquartile range below or above the 25th or 75th percentiles, respectively.
Figure 6 |
Figure 6 |. Nominating causal genetic variants and identifying their effector transcripts at the FADS locus.
a, ~110-kb region surrounding the FADS locus showing all variants from 1000 Genomes, with notations for eQTLs and number of FADS genes they are associated with. Connectogram of all CREs identified via HCR-FlowFISH highlight complex regulatory landscape with extensive CRE sharing. b, Fine-mapped credible variant sets for eGFR and HDL levels share similar posterior inclusion probabilities (PIP) due to high LD. MPRA (red), DHS (light blue) and H3K27ac (dark blue) signals, as well as composite guide scores for HCR-FlowFISH on FADS1 (green), FADS2 (teal), FADS3 (orange), and FEN1 (purple) at CRE within an intron of FADS2. Variants within a HCR-FlowFISH identified FADS1 CRE are labeled in green, and variants displaying allelic skew via MPRA are outlined in red, with location of the only variant identified with both denoted by a green bar. c, Total cholesterol GWAS at the FADS locus yields 73 genome-wide significant variants. d, Results of allelic skew in the MPRA shows the alternative allele drives increased CRE activity. e, Many traits significantly associated with rs2727271 are direct metabolites of the FADS1 enzyme.

References

    1. Davis CA et al. The Encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res. 46, D794–D801 (2018). - PMC - PubMed
    1. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012). - PMC - PubMed
    1. Andersson R et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014). - PMC - PubMed
    1. Chen L et al. Genetic drivers of epigenetic and transcriptional variation in human immune cells. Cell 167, 1398–1414.e24 (2016). - PMC - PubMed
    1. Sanyal A, Lajoie BR, Jain G & Dekker J The long-range interaction landscape of gene promoters. Nature 489, 109–113 (2012). - PMC - PubMed

Methods-only References

    1. Doench JG et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol 34, 184–191 (2016). - PMC - PubMed
    1. Langmead B, Trapnell C, Pop M & Salzberg SL Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009). - PMC - PubMed
    1. Hsu PD et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat. Biotechnol 31, 827–832 (2013). - PMC - PubMed
    1. Morgens DW et al. Genome-scale measurement of off-target activity using Cas9 toxicity in high-throughput screens. Nat. Commun 8, 15178 (2017). - PMC - PubMed
    1. Wang T, Lander ES & Sabatini DM Viral packaging and cell culture for CRISPR-based screens. Cold Spring Harb. Protoc 2016, db.prot090811 (2016). - PMC - PubMed

Publication types

MeSH terms