Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun;53(6):869-880.
doi: 10.1038/s41588-021-00861-8. Epub 2021 May 6.

Single-nucleotide-level mapping of DNA regulatory elements that control fetal hemoglobin expression

Affiliations

Single-nucleotide-level mapping of DNA regulatory elements that control fetal hemoglobin expression

Li Cheng et al. Nat Genet. 2021 Jun.

Abstract

Pinpointing functional noncoding DNA sequences and defining their contributions to health-related traits is a major challenge for modern genetics. We developed a high-throughput framework to map noncoding DNA functions with single-nucleotide resolution in four loci that control erythroid fetal hemoglobin (HbF) expression, a genetically determined trait that modifies sickle cell disease (SCD) phenotypes. Specifically, we used the adenine base editor ABEmax to introduce 10,156 separate A•T to G•C conversions in 307 predicted regulatory elements and quantified the effects on erythroid HbF expression. We identified numerous regulatory elements, defined their epigenomic structures and linked them to low-frequency variants associated with HbF expression in an SCD cohort. Targeting a newly discovered γ-globin gene repressor element in SCD donor CD34+ hematopoietic progenitors raised HbF levels in the erythroid progeny, inhibiting hypoxia-induced sickling. Our findings reveal previously unappreciated genetic complexities of HbF regulation and provide potentially therapeutic insights into SCD.

PubMed Disclaimer

Figures

Extended Data Fig. 1
Extended Data Fig. 1. Establishment of an ABEmax-based system to perturb regulatory sequences.
(a) ABEmax-Cas9 protein levels measured by Western blot analysis in wild-type HUDEP-2 cells (WT) and HUDEP-2 cells infected with different dosages of ABEmax lentivirus. β-Actin was used as a loading control. The result is representative of three independent experiments (Image was cropped from source data Fig. 2). (b) HUDEP-2 cells with different levels of ABEmax expression were transduced with the same amount of gRNA targeting the HBG promoter. The graphs show the hemoglobin (Hb) protein content, as measured by isoelectric focusing high-performance liquid chromatography (IE-HPLC) in HUDEP-2 cells after 5 additional days of induced erythroid maturation. The result is representative of three independent experiments. (c) Jitter plots showing the percentage of adenosine-to-inosine RNA modification by ABEmax in wild-type HUDEP-2 cells (WT), HUDEP-2 cells stably expressing an ABE (ABEmax), and HEK293T cells. The y-axis represents the efficiency of A-to-I RNA editing. n = total number of modified adenines observed. (d) Targeted deep-sequencing analysis of the BCL11A CRE after editing with ABEmax and BCL11A_ENH gRNA. The mutations are indicated in bold. The red arrowhead indicates the targeted nucleotide. (e) Western blot analysis with the indicated antibodies in undifferentiated (Day 0) and differentiated (Day 5) HUDEP-2 cells transduced with non-targeting control gRNA (Ctrl) or with BCL11A-ENH gRNAs. The result is representative of three independent experiments. (Image was cropped from source data Fig. 3)
Extended Data Fig. 2
Extended Data Fig. 2. High-throughput mapping of CREs regulating HbF in HUDEP-2 cells and single-gRNA validation in CD34+ HSPCs.
(a,b) Dot plots showing the correlation between two biological replicates of ABE screens for the HbFhigh (a) and HbFlow (b) cell populations. Each dot represents one gRNA; the x- and y-axes represent the normalized read counts. (c,d) Validation studies of top-hit gRNAs in normal donor CD34+ HSPC–derived erythroblasts. CD34+ cells were transfected with RNP complexes consisting of ABEmax + non-targeting control (Ctrl) gRNA or individual top-hit gRNAs and analyzed after 12 days of erythroid differentiation. (c) HbF protein levels measured by Western blot analysis. The result is representative of three independent experiments. (Image was cropped from source data Fig. 4) (d) Flow-cytometry plots showing the expression of the RBC maturation markers Band3 and CD49d after 12 days of differentiation (left) and a bar chart summarizing the results from three replicates (right). Error bars represent the mean +/− S.E.M from three independent experiments. (e) Boxplot comparing the HbF effects of gRNAs without editable adenines (n=112) and none targeting control gRNAs (n=20). Y-axis is log2 ratio of gRNA reads counts between HbFhigh and HbFlow cells. P-value was determined by unpaired two-tailed Wilcoxon test. Box depicts the interquartile range; central line indicates the median and whiskers indicate minimum/maximum values. (f) Scatterplot showing the F-cell fractions measured by immune-flow cytometry in HUDEP-2- ABEmax and HUDEP-2-dCas9 cells transfected with 10 gRNAs. Each dot represents one gRNA. (g) Comparison of target site mutation frequencies in HbFhigh and HbFlow cells. Cells were treated with ABEmax and 5 different gRNAs and then sorted based on HbF levels after 5 days differentiation. The frequencies are calculated based on one argeted deep-sequencing result.
Extended Data Fig. 3
Extended Data Fig. 3. ABE mutagenesis at different genomic loci.
(a) Bar plot showing the effects of NFIX CREs on the expression levels of NFIX and KLF1. Y-axis shows relative mRNA expression measured by real-time RT-qPCR in HUDEP-2 cells edited with ABEmax and the two indicated gRNAs. The expression levels were normalized by those from HUDEP-2 cells treated with ABEmax and non-targeting control gRNA (Ctrl) (n=3 independent experiments). (b) β-Like globin gene cluster–associated HbFhigh gRNAs: Chromatin interaction loops, indicated by red arcs, were determined by H3K27ac Hi-ChIP in HUDEP-2 cells. gRNA -log(FDR) represents the difference in gRNA abundance between the HbFhigh and HbFlow populations. ATAC-seq analysis reflects chromatin openness.
Extended Data Fig. 4
Extended Data Fig. 4. Empirical distribution of ABE editing efficiency and DNA sequence motifs measured by ATAC-seq.
(a) Empirical distribution of ABEmax editing activities in HUDEP-2 cells. Bar plot shows the average editing activity at different positions among 23 different ABEmax edited loci. X-axis denotes positions relative to protospacer start (position 1). Y-axis shows the A to G conversion rate. (b) A heatmap of on-target base-editing efficiencies of ABEmax as measured by targeted amplicon sequencing of 23 different edited genomic loci (row). Each cell represents one nucleotide. The cell number indicates the relative position of the nucleotide relative to the PAM sequence. The editing efficiency was measured by determining the percentage of nucleotide converted by ABEmax. (c) The footprint profiles of GATA1, ZBTB7A, and CTCF binding sites derived from deep sequencing (ATAC-seq). The heatmap represents the ATAC-seq signals within a ±100-bp window for the top 1000 binding sites for each TF. Each row represents one binding site. Aggregated signals are plotted in the top panels.
Extended Data Fig. 5
Extended Data Fig. 5. 3′ HBG1 enhancer–edited clones in HUDEP-2 cells.
(a) Genome browser screenshot of ZBTB7A occupancy profiles in HBG1 locus. gRNA track showing the location of the gRNA. Wild type HUDEP-2 ChIP-seq was downloaded from GSE103445. Two mutated clones (designated H2_mut_C1 and H2_mut_C2) were generated using ABEmax and gRNAs targeting 3’ HBG1 CRE. The position of the CRE was highlighted in blue. (b) Amplicon sequencing confirming the mutations in HUDEP-2 cells derived from single clones after treatment with the Chr11–3 gRNA. Edited adenines are marked in red box. (c–e) Validation studies of two HUDEP-2 single clones with mutations in the 3′ HBG1 enhancer. (c) The percentage of γ-globin mRNA as determined by real-time RT-qPCR. The error bars represent the ± S.E.M from three independent experiments. **** P = 4X10−7; unpaired t-test, two side. (d) The hemoglobin F fraction measured by IE-HPLC. The values represent the mean ± S.E.M from three independent experiments. ****P = 6×10−7; unpaired t-test, two side. (e) F-cell fractions measured by immuno-flow cytometry (left). The bar chart (right) shows the values from two independent experiments.
Extended Data Fig. 6
Extended Data Fig. 6. Epigenetic signals of CREs regulating HbF levels.
Box plots showing the epigenetic signal distribution among adenines with high (>30) (n=313) and low (<10) BPRSHbF (n=9268). (P-value were determined using with two-tailed Wilcoxon test. Box depicts the interquartile range; central line indicates the median and whiskers indicate minimum/maximum values.
Extended Data Fig. 7
Extended Data Fig. 7. Functional noncoding sequences and SNVs associated with HbF levels in patients with SCD.
(a) The ratio of the mutation burden in patients with SCD with high HbF to that in patients with SCD with normal HbF at genomic loci with high BPRSHbF (the top 200). The x-axis represents the threshold of minor allele frequency (MAF) that was used to filter variants. The y-axis represents the different window sizes centered on genomic loci with high BPRSHbF. The number in each cell represents the ratio of the normalized mutation burden (see Methods) in patients with SCD with high RBC HbF levels to that in patients with SCD with normal HbF levels. (b) The precision-recall curve representing the performance of a random forest model that predicts HbF levels by using the mutation burden within two groups of genomic loci. The green curve represents the model including only 18 common GWAS variants, and the red curve represents the model including the common GWAS variants plus 56 variants with high BPRSHbF. Dashed lines represent the precision at 75% recall rate. (c) A box plot showing a pair-wise performance comparison of the two models. n= 400 random samplings. P-value is determined using paired two-tailed t-test. Box depicts the interquartile range; central line indicates the median and whiskers indicate minimum/maximum values.
Extended Data Fig. 8
Extended Data Fig. 8. Targeting erythroid-specific regulatory elements to increase HbF levels in erythroid progeny derived from HSPCs from donors with SCD.
(a) A heatmap showing the distribution of chromatin accessibility, as measured by ATAC-seq, near edited adenines for 15 different blood cell types. Representative adenines with high (top) and low (bottom) erythroid-specific scores (Z-scores) were selected for plotting. The cell types for each track are shown at the bottom. (b–e) CD34+ HSPCs from two donors with SCD were transfected with RNP consisting of ABE and Chr11–1 gRNA targeting the 3′ HBG1 enhancer or a non-targeting control gRNA (Ctrl), then grown in culture under conditions that support erythroid differentiation. Hemoglobinized erythroblasts were analyzed at day 12. (b) The percentage of γ-globin mRNA as determined by real-time RT-qPCR (n=2 different SCD participants). (c) Representative flow-cytometry plots showing the expression of the RBC maturation markers Band3 and CD49d (n=2 different SCD participants). (d) May–Grünwald–Giemsa–stained erythroblasts. Scale bar, 20 μM. This is representative results of 2 SCD participants. (e) Images of sickled erythroid cells. Arrowheads mark cells with sickle-like morphology. This is representative results of 2 SCD participants. Original picture was visualized by phase-contrast microscopy using the IncuCyte S3 Live-Cell Analysis System (Sartorius) with a 20X objective; Size bars, 20 μM.
Extended Data Fig. 9
Extended Data Fig. 9. Gating strategies used for cell sorting during RBC maturation.
(a) Gating strategy to determine the percentage of RBC maturation markers Band3 and CD49d after 5 additional days of induced differentiation in WT and HUDEP-2-ABEmax cells presented on Fig. 1d. (b) Gating strategy to determine the percentage of the RBC maturation markers Band3 and CD49d after 12 days of differentiation of normal CD34+ HSPCs (transfected by 5 gRNAs, respectively.) presented on Extended data Fig. 2d. (c) Gating strategy to determine the percentage of the RBC maturation markers Band3 and CD49d after 12 days of differentiation of SCD derived CD34+ HSPCs (transfected by 2 gRNAs, respectively.) presented on Extended data Fig. 8c.
Extended Data Fig. 10
Extended Data Fig. 10. Gating strategies used for F cells sorting.
(a) Gating strategy to determine the percentage of F cells in Ctrl or BCL11A-ENH gRNA transfected HUDEP-2 cells presented on Fig. 1i. (b) Gating strategy to determine the percentage of F cells after 12 days of differentiation of SCD derived CD34+ HSPCs (transfected by 2 gRNAs, respectively.) presented on Fig. 6e. (c) Gating strategy to determine the percentage of F cells presented on Extended data Fig. 5e.
Figure 1.
Figure 1.. Establishment of an ABE-based system to perturb CREs.
(a) A schematic representation of the overall strategy: 1) Predicting CREs regulating HbF levels. 2) Combining a high-throughput ABEmax screen with computational deconvolution to identify CREs with base-pair resolution. 3) Applying high-resolution CRE mapping to understand functional variants and to identify potential novel therapeutic targets. (b) ABEmax protein level in wild-type HUDEP-2 (WT) cells and HUDEP-2–ABEmax (ABEmax) cells measured by Western blot analysis. The result is representative of two independent experiments (image was cropped from source data Fig. 1). (c) Heatmap showing the Pearson correlation of WT and ABEmax cell transcriptomes as measured by RNA-seq. The number in each cell is the correlation coefficient. (d–i) HUDEP-2–ABEmax cells were transduced with a non-targeting control gRNA (Ctrl) or with a gRNA targeting the BCL11A enhancer (BCL11A–ENH). (d) Flow-cytometry plots showing the expression of the RBC maturation markers Band3 and CD49d after 5 days of induced differentiation. The result is representative of three independent experiments. (e) On-target DNA base-editing efficiencies measured by next-generation sequencing. P = 1.8 × 10−4. (f) BCL11A mRNA measured by real-time RT-qPCR. P = 2.7 × 10−5. (g) The percentage of γ-globin mRNA (γ/γ+β) or β-globin mRNA (β/γ+β) as measured by real-time RT-qPCR. (h) The hemoglobin (Hb) protein content measured by IE-HPLC after 5 additional days of induced erythroid maturation. The result is representative of three independent experiments. (i) Flow-cytometry plots showing HbF immunostaining (F-cells). The result is representative of three independent experiments. *P values were determined using two-tailed unpaired t-test. Bar plots show mean ± S.E.M. of three independent experiments.
Figure 2.
Figure 2.. High-throughput ABEmax perturbation identifies CREs that control HbF expression.
(a) Schematic representation of the workflow showing bioinformatics-informed gRNA library construction, mutagenesis, and selection for HbFhigh and HbFlow phenotypes. (b) Differential representation of individual targeting gRNAs in HbFhigh and HbFlow populations as a volcano plot. Each dot represents a single gRNA. The x-axis shows the log fold-difference and the y-axis shows the −log10(FDR). Light green dots, CRE-targeting gRNAs; Blue crosses, non-editing control gRNAs; orange triangles, non-targeting control gRNAs. (c–e) Validation studies of top-hit gRNAs in HUDEP-2 cells. HUDEP-2–ABEmax cells were treated with a non-targeting control (Ctrl) gRNA or with randomly chosen individual candidate gRNAs that were identified as top screening hits by their over-representation in HbFhigh cells. The chromosome locations of each gRNA are shown on the x-axis. (c) F-cell fractions measured by immuno-flow cytometry. (d) The percentage of γ-globin mRNA as determined by real-time RT-qPCR. (e) The hemoglobin F fraction measured by IE-HPLC. (f–h) Validation studies of top-hit gRNAs in normal donor CD34+ HSPC–derived erythroblasts. CD34+ cells were transfected with RNP complexes consisting of ABEmax + non-targeting control (Ctrl) gRNA or individual top-hit gRNAs, grown in culture under conditions that support erythroid differentiation, and analyzed after 14 days. (f) F-cell fractions measured by immuno-flow cytometry. (g) The percentage of γ-globin mRNA as determined by real-time RT-qPCR. (h) The hemoglobin F fraction measured by IE-HPLC. ****P < 0.0001, ***P < 0.001, **P < 0.01, *P < 0.05, were determined using two-tailed unpaired t-test (Exact P values for each test are listed in source data). Bar plots show mean ± S.E.M. of three independent experiments.
Figure 3.
Figure 3.. Genome browser screenshots mapping HbF-regulating CREs identified by ABE mutagenesis.
(a) BCL11A-associated HbFhigh gRNAs. The top panel shows the gene body region including an erythroid-specific super-enhancer represented by three DNase I–hypersensitive sites at positions +62, +58, and +55. gRNA log(FDR) represents the difference in gRNA abundance between the HbFhigh and HbFlow populations. ATAC-seq analysis reflects chromatin openness. The bottom panel shows a zoom-out view including a 1.8-Mb surrounding region. Chromatin interaction loops indicated by red arcs were determined by H3K27ac Hi-ChIP in HUDEP-2 cells. The thick red line (P1) represents the location of a long deletion reported in a participant with a low erythroid BCL11A level and high HbF. (b) KLF1-associated HbFhigh gRNAs, labeled as in panel a. Note the nearby NFIX gene, which was recently identified as regulating HbF levels. (c) HBS1L-MYB–associated HbFhigh gRNAs, labeled as in panel a. SNVs (rs4895440, rs4895441) associated with HbF levels are labeled with red triangles. *Specific CRE regions analyzed in further detail are color coded.
Figure 4.
Figure 4.. Dissection of HbF-regulating CREs with base-pair resolution via ABE mutagenesis.
(a) Deconvoluting gRNA level measurements to base-pair regulatory scores. The left panel shows the algorithm (See the Methods for details) used to adjust for bystander editing, based on the empirical measurements obtained using amplicon sequencing. The right panel shows the frequency of A-based edits at different positions. (b) An accumulative curve showing the distribution of distance between the 200 highest BPRSHbF adenines and their nearest TF footprints (top panel). The lower part of the panel shows high-BPRSHbF adenines (red) and TF footprints (blue) distribution within a 200-bp window in the BCL11A. Each vertical line represents one base pair. (c) The transcription factor binding motifs most enriched in the TF footprints near adenines with high BPRSHbF. (d–g) Fine mapping of CREs at the indicated loci. The top portion of each panel shows a zoom-in view of TF motifs disrupted. Edited adenines (or thymidines representing the opposite strand) are indicated in red font. (d) The BCL11A intron 2 erythroid enhancer. See also Figure 3a, top panel. (e) An HbF CRE located approximately 1 Mb downstream of the BCL11A gene. See also the blue-shaded area in Figure 3a, bottom panel. (f) An HbF CRE located in NFIX intron. The lower track shows the GATA1 ChIP-seq signals in HUDEP-2 cells. See also the blue-shaded area in Figure 3b. (g) An HbF CRE located immediately 3′ to HBG1. ATAC-seq signals are shown for wild-type HUDEP-1 (H1), HUDEP-2 (H2) cells and two different HUDEP-2 clones (H2_mut_C1 and H2_mut_C2) after ABEmax mutagenesis of the ZBTB7A motif.
Figure 5.
Figure 5.. Nucleotide sequence and epigenetic determinants of HbF regulatory sequences.
(a) Genome browser screenshot of the β-like globin gene cluster showing the distribution of BPRSHbF in relation to various epigenetic signals. The tracks are color-coded as follows: red, 3D chromatin organization; dark red, chromatin accessibility (ATAC-seq); blue, key TF occupancy; brown, histone modifications. (b) Box plots comparing epigenetic signal distribution in ABE-mutated adenines with high (>30) (n = 313) and low (<10) BPRSHbF (n = 9,268). The y-axis represents log-transformed normalized signals. P value were determined using with two-tailed unpaired Wilcoxon test. Box depicts the interquartile range; central line indicates the median and whiskers indicate minimum/maximum values. (c) The significance of differences in specific functional genomic features in ABE-mutated adenines with high and low BPRSHbF, ranked by −log10 (P value). P value were determined using with two-tailed Wilcoxon test. (d) Comparison of CRE prediction models. The left panel summarizes the performance of different models for resolving high- and low-BPRSHbF adenines according to the distance between them. The y-axis represents the area under the receiver operating characteristic (AUROC) curves. The x-axis represents the threshold of minimal distance. The two panels at right show details of the prediction performance, represented by AUROC curves at different window sizes. RF refers to the random forest model that we developed. n = 400 iterations of cross-validation. Bar plot shows the mean and 95% confidence interval of the AUROC distribution.
Figure 6.
Figure 6.. Novel HbF CREs identified by ABE mutagenesis are associated with high HbF levels in an SCD cohort.
(a) Workflow for the Analysis of SNV burden in SCD participants with differing HbF levels (see Methods). (b) Comparison of normalized mutation burden (y-axis) within different window sizes (x-axis) centered on adenines with high BPRSHbF(see Methods). The cyan area indicates the 95% confidence interval for the distribution of participants with normal HbF based on 1,000 samplings. The median values plotted as the center line. (c) The ROC curve of the classification model that discriminates between subjects with high and normal HbF levels. The green curve represents the model including only known GWAS variants (n = 18), and the red curve represents the models including both known GWAS variants and variants within 20 bp of adenines with high BPRSHBF(n = 56). (d) Box plot showing a pair-wise performance comparison of the two models described in (c). n = 400 random samplings. P value is determined using paired two-tailed t-test. Box depicts the interquartile range; central line indicates the median and whiskers indicate minimum/maximum values. Data beyond the end of the whiskers are outlying points that are plotted individually. (e−g) CD34+ HSPCs from donors with SCD were transfected with RNP consisting of ABE and Chr11–1 gRNA targeting the 3′ HBG1 enhancer or non-targeting control gRNA (Ctrl). The results are representative of two donors. (e) F-cell fractions measured by immuno-flow cytometry. (f) The hemoglobin F fraction measured by IE-HPLC. (g) The fraction of sickled cells among reticulocytes from the edited and control populations incubated for 8 hours in 2% oxygen.

References

    1. Agrawal P, Heimbruch KE & Rao S Comprehensive Physiology. Compr Physiol 9, 439–455 (2018). - PMC - PubMed
    1. Rickels R & Shilatifard A Enhancer Logic and Mechanics in Development and Disease. Trends Cell Biol 28, 608–630 (2018). - PubMed
    1. Bolt CC & Duboule D The regulatory landscapes of developmental genes. Development 147, dev171736 (2020). - PMC - PubMed
    1. Driscoll MC, Dobkin CS & Alter BP Gamma delta beta-thalassemia due to a de novo mutation deleting the 5’ beta-globin gene activation-region hypersensitive sites. Proc National Acad Sci 86, 7470–7474 (1989). - PMC - PubMed
    1. Kioussis D, Vanin E, deLange T, Flavell RA & Grosveld FG β-Globin gene inactivation by DNA translocation in γβ-thalassaemi. Nature 306, 662–666 (1983). - PubMed

Methods-Only References

    1. Hu J et al. Isolation and functional characterization of human erythroblasts at distinct stages: implications for understanding of normal and disordered erythropoiesis in vivo. Blood 121, 3246–3253 (2013). - PMC - PubMed
    1. Bray NL, Pimentel H, Melsted P & Pachter L Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 34, 525–527 (2016). - PubMed
    1. Pimentel H, Bray NL, Puente S, Melsted P & Pachter L Differential analysis of RNA-seq incorporating quantification uncertainty. Nat Methods 14, 687–690 (2017). - PubMed
    1. Corces MR et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat Methods 14, 959–962 (2017). - PMC - PubMed
    1. Li Z et al. Identification of transcription factor binding sites using ATAC-seq. Genome Biol 20, 45 (2019). - PMC - PubMed

Publication types