Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Feb;34(2):167-74.
doi: 10.1038/nbt.3468. Epub 2016 Jan 25.

High-throughput mapping of regulatory DNA

Affiliations

High-throughput mapping of regulatory DNA

Nisha Rajagopal et al. Nat Biotechnol. 2016 Feb.

Abstract

Quantifying the effects of cis-regulatory DNA on gene expression is a major challenge. Here, we present the multiplexed editing regulatory assay (MERA), a high-throughput CRISPR-Cas9-based approach that analyzes the functional impact of the regulatory genome in its native context. MERA tiles thousands of mutations across ∼40 kb of cis-regulatory genomic space and uses knock-in green fluorescent protein (GFP) reporters to read out gene activity. Using this approach, we obtain quantitative information on the contribution of cis-regulatory regions to gene expression. We identify proximal and distal regulatory elements necessary for expression of four embryonic stem cell-specific genes. We show a consistent contribution of neighboring gene promoters to gene expression and identify unmarked regulatory elements (UREs) that control gene expression but do not have typical enhancer epigenetic or chromatin features. We compare thousands of functional and nonfunctional genotypes at a genomic location and identify the base pair-resolution functional motifs of regulatory elements.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Multiplexed editing regulatory assay (MERA)
(a) In MERA, a genomically integrated dummy gRNA is replaced with a pooled library of gRNAs through CRISPR/Cas9-based homologous recombination such that each cell receives a single gRNA. Guide RNAs are tiled across the cis-regulatory regions of a GFP-tagged gene locus, and cells are flow cytometrically sorted according to their GFP expression levels. Deep sequencing on each population is used to identify gRNAs preferentially associated with partial or complete loss of gene expression. (b) Zfp42GFP mESCs express uniformly strong GFP. After bulk gRNA integration, a subpopulation of cells lose partial or complete GFP expression. These cells are flow cytometrically isolated for deep sequencing. (c,d) Bulk reads for gRNAs are highly correlated between replicates of (c) Tdfg1 or (d) Zfp42, indicating consistent and replicable integration rates.
Figure 2
Figure 2. MERA enables systematic identification of required cis-regulatory elements for Tdgf1
(a) A genomic view the Tdgf1 proximal regulatory region showing in track order (i) the location of gRNAs that did not result in GFP loss, (ii) enriched gRNAs in GFPneg cells (red), (iii) enriched gRNAs in GFPmedium cells (green), (iv) annotated genes, (v) predicted enhancers (green=weak, red=strong), (vi) DNase-I hotspot regions, (vii) transcription factor binding density based on ChIP-seq data, (vii) H3K4me3 ChIP-seq data. Several active regulatory elements coincide with dense clusters of overlapping gRNAs. A large number of gRNA significantly enriched in GFPneg population are also observed in regions devoid of regulatory element features (UREs). Genomic regions of interest are shaded, annotated above the plot, and described in further detail in the text. (b) Individual validation of specific gRNAs detected as enriched in the GFPneg population in the MERA assay using the self-cloning CRISPR system. The proportion of cells undergoing GFP loss upon incorporation of a particular gRNA divided by the proportion of cells undergoing GFP loss upon incorporation of GFP-targeting positive control gRNA are plotted against the actual genomic location of the gRNA. Negative controls or gRNA showing no reads in either GFPneg and GFPmedium populations are highlighted in red. (c) Correlation of gRNAs significantly enriched in the GFPneg population in fixed size bins varying from 100bp to 1kb for biological replicates in Tdgf1 (d) Fraction of GFPneg enriched gRNA among the different functional genomic categories surrounding the Tdgf1 gene.
Figure 3
Figure 3. MERA enables systematic identification of required cis-regulatory elements for Zfp42
(a) A genomic view the Zfp42 proximal regulatory region showing in track order (i) the location of gRNAs that did not result in GFP loss, (ii) enriched gRNAs in GFPneg cells (red), (iii) enriched gRNAs in GFPmedium cells (green), (iv) annotated genes, (v) predicted enhancers (green=weak, red=strong), (vi) DNase-I hotspot regions, (vii) transcription factor binding density based on ChIP-seq data, (vii) H3K4me3 ChIP-seq data. Several active regulatory elements coincide with dense clusters of overlapping gRNAs. Genomic regions of interest are shaded, annotated above the plot, and described in further detail in the text. (b) Correlation of gRNAs significantly enriched in the GFPneg population in fixed size bins varying from 100bp to 1kb for biological replicates in Tdgf1. c.) Fraction of GFPneg enriched gRNA among the different functional genomic categories surrounding the Tdgf1 gene.
Figure 4
Figure 4. Functional motif discovery analysis of region-specific mutant genotypes at enhancers reveals required regulatory motifs
(a) A schematic of the procedure involved in finding mutations induced by a particular gRNA (b) Plot showing the genomic regions surrounding two gRNAs at a proximal Tdgf1 enhancer region (gRNAs are shaded) showing overlap with DNase-I hotspot and predicted enhancer regions, and transcription factor binding sites Stat3, Tcfcp2l1 and Sox2. (c) ROC curve for fivefold classification of GFPneg and GFPpos genotypes using mutations within −20 to +20bp of the gRNA along left and right paired end reads as features. (d) Motif logo for region mutated by gRNAs with base scores computed as log-ratios of the hellinger distance of the GFPneg genotypes at a base to the reference base to the hellinger distance of the GFPpos genotypes at a base to the reference base, caused by Tdgf_gRNA_1 andTdgf_gRNA_2 along the left paired end read.
Figure 5
Figure 5. Functional motif discovery analysis of a URE reveals critical base positions involved in gene regulation
(a) Plot showing the genomic regions surrounding two gRNAs (gRNAs are shaded) showing their absence of active histone modifications, known transcription factor binding, predicted enhancers or DNase-I hotspots. (b) Receiver-operating characteristic (ROC) curve for fivefold classification of GFPneg and GFPpos genotypes using mutations on the right paired end read within –20 to +20bp of Tdgf_URE_gRNA2. Unweighted classification (in blue) counts each unique genotype in the test-set only once while weighted classification (red) counts each unique genotype in the test-set as many times as the number of reads assigned to it, for calculating sensitivity and specificity. (c) Fraction of unique genotypes in GFPneg and GFPpos populations with mutations at bases along the right paired end read reveals pattern of cleavage around Tdgf_URE_gRNA2. (d) Motif logo for the region mutated by Tdgf_URE_gRNA2 along the right paired end read with base scores computed as log-ratios of the hellinger distance of the GFPneg genotypes at a base to the reference base to the hellinger distance of the GFPpos genotypes at a base to the reference base.
Figure 6
Figure 6. Local genotypes at an enhancer and a URE dictate Tdgf1 expression phenotype
(a.) Tdgf1 MERA screen ratio of GFPmedium/neg/bulk reads for each gRNA at an upstream enhancer (left) and a downstream URE (right) region. (b) Flow cytometric measurement of Tdgf1-GFP expression in clonal cell lines following CRISPR-induced deletion of the shaded regions from (a) show loss of GFP (1st and 3rd plots from left). CRISPR-mediated homology-directed repair (HDR) back to the wildtype genotype induced robust GFP recovery at both loci (2nd and 4th plots from left). c.) Tdgf1 RNA expression in wild-type mESCs (left), clonal mESC lines with deletions of the enhancer and URE shaded in (a) (2nd and 3rd from left), and bulk mESC lines following HDR back to the wildtype genotype (4th and 5th from left), all normalized to wildtype expression level.

References

    1. Jenuwein T, Allis CD. Translating the histone code. Science. 2001;293:1074–1080. - PubMed
    1. Bernstein BE, et al. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell. 2006;125:315–326. - PubMed
    1. Rada-Iglesias A, et al. A unique chromatin signature uncovers early developmental enhancers in humans. Nature. 2011;470:279–283. - PMC - PubMed
    1. Heintzman ND, et al. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature. 2009;459:108–112. - PMC - PubMed
    1. Creyghton MP, et al. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proceedings of the National Academy of Sciences of the United States of America. 2010;107:21931–21936. - PMC - PubMed

Publication types