Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Dec 19;9(1):5380.
doi: 10.1038/s41467-018-07746-1.

High-resolution genome-wide functional dissection of transcriptional regulatory regions and nucleotides in human

Affiliations

High-resolution genome-wide functional dissection of transcriptional regulatory regions and nucleotides in human

Xinchen Wang et al. Nat Commun. .

Abstract

Genome-wide epigenomic maps have revealed millions of putative enhancers and promoters, but experimental validation of their function and high-resolution dissection of their driver nucleotides remain limited. Here, we present HiDRA (High-resolution Dissection of Regulatory Activity), a combined experimental and computational method for high-resolution genome-wide testing and dissection of putative regulatory regions. We test ~7 million accessible DNA fragments in a single experiment, by coupling accessible chromatin extraction with self-transcribing episomal reporters (ATAC-STARR-seq). By design, fragments are highly overlapping in densely-sampled accessible regions, enabling us to pinpoint driver regulatory nucleotides by exploiting differences in activity between partially-overlapping fragments using a machine learning model (SHARPR-RE). In GM12878 lymphoblastoid cells, we find ~65,000 regions showing enhancer function, and pinpoint ~13,000 high-resolution driver elements. These are enriched for regulatory motifs, evolutionarily-conserved nucleotides, and disease-associated genetic variants from genome-wide association studies. Overall, HiDRA provides a high-throughput, high-resolution approach for dissecting regulatory regions and driver nucleotides.

PubMed Disclaimer

Conflict of interest statement

X.W., L.H., M.C., and M.K. have filed a patent on the HiDRA methodology (ATAC-STARR + SHARPR-RE). The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Overview of HiDRA. a Cells with the desired genotype and open chormatin patterns are selected for library construction. Tn5 transposase is used to preferentially fragment genomic DNA at regions of open chromatin. Fragments are then size-selected on an agarose gel and mtDNA contamination is removed by selective CRISPR-Cas9 degradation. The fragment library is amplified by PCR and cloned into an enhancer reporter vector. Gel image adapted from Buenrostro et al.. Fragments are cloned into the STARR-seq vector backbone, introduced into target cells (which can differ from cells used to construct the library), and RNA is collected and sequenced. After data processing, the activity of partially-overlapping fragments is compared to identify driver nucleotides using the SHARPR-RE algorithm. b Size distribution of HiDRA library fragments (blue) and tiled regions (green). Bimodal shape for library fragment sizes is due to Tn5 preference to cut adjacent to nucleosomes. Fragment bin size = 20 nt, region bin size = 50 nt. c Number of ChromHMM-predicted active enhancer, active TSS and ATAC-seq peaks covered by multiple unique HiDRA fragments. d HiDRA plasmid library recapitulates the genomic coverage of a conventional ATAC-seq experiment
Fig. 2
Fig. 2
HiDRA identifies transcriptional regulatory elements. a Scatterplot of abundances for HiDRA fragments in input (plasmid DNA) and output (RNA) samples. Abundances calculated after merging all five replicates. Active HiDRA fragments called by DESeq2 highlighted with red dots (FDR < 0.05), blue color intensity corresponds to greater density of points. b The majority of HiDRA active regions are distal to annotated TSSs ( > 2 kb). c HiDRA identifies enhancer activity within an intron in the immunoglobulin heavy chain locus. Red bar, DNA segment active in luciferase assay performed by Huang et al.. Orange bar and highlight, region identified by HiDRA as having transcriptional regulatory activity. d Quantitative comparison of luciferase assay activity levels to HiDRA for 21 predicted enhancer elements. HiDRA signal corresponds to maximum activity within the region tested by luciferase, and luciferase value corresponds to median normalized activity over biological replicates. Pearson correlation calculated after log2 transformation. e Comparison of HiDRA-called active regions with luciferase assay results for 13 enhancers at the NEK6 locus. Luciferase experiments are colored in red or gray depending on whether DNA fragments drive luciferase activity in GM12878 cells as determined by Huang et al.
Fig. 3
Fig. 3
Active HiDRA fragments are enriched in endogenously active regulatory regions. a Overlap of active HiDRA fragments with different endogenous chromatin states. Heights correspond to proportion of nucleotides within active HiDRA fragments in each chromatin state. Inset: histone modification enrichments in each of 18 ChromHMM chromatin states b HiDRA fragment regulatory activity (fold-change increase in RNA levels) across different chromatin states. Numbers correspond to chromatin state numbers in 18-state ChromHMM model. c, d Endogenously inactive genomic regions have low levels of TF binding (c) but comparable TF motif composition (d) to predicted active regions. Colored bars, regions from each chromatin state overlapping active HiDRA regions. Gray bars, regions from each chromatin state overlapping all fragments tested
Fig. 4
Fig. 4
High-resolution mapping of transcriptional regulatory elements with SHARPR-RE. a Example region used in high-resolution mapping. Fragment activity shown on log2 scale with two fragments with highest and lowest activity removed for color scale to avoid outliers. The transparent red bar indicates the driver element identified at the regional FWER < 0.05. b Size distribution of driver elements. c Enrichment of immune-related TF motifs in driver elements compared with shuffled driver elements within tiled regions. d TF motifs enriched in driver elements cluster into groups of co-occurring motifs, suggesting diversity of TF motifs involved in transcriptional regulatory activity. e Significantly more driver elements are evolutionary conserved compared to shuffled driver elements within tiled regions. Evolutionary conservation cut-off chosen as conservation score for top 5% of shuffled regions (= 2.23 x 10–73 vs. Gaussian distribution estimated by random shuffling of driver element positions in tiled regions). f SNPs within driver elements have significantly greater allelic skew by MPRA (Tewhey et al.) compared with those within tiled regions or across the genome. P value calculated by Mann-Whitney U test
Fig. 5
Fig. 5
High-resolution driver elements are enriched for fine-mapped GWAS SNPs. a Driver elements overlap more GWAS fine-mapped SNPs associated with 21 human immune-related complex traits than randomly shuffled regions. p-value calculated empirically by random shuffling of driver element positions within tiled regions. b Example locus at rs12946510 that overlaps a high-resolution driver element. Highlighted segment indicates the driver element identified at the regional FWER < 0.05. Red bar at top corresponds to region with luciferase activity as demonstrated by Hitomi et al.
Fig. 6
Fig. 6
Identification of human genetic variants that alter HiDRA activity. a Overview of genotyping approach for HiDRA fragments. HiDRA fragments were originally quantified at high-depth using 37 nt paired-end reads. At this read length the allele composition of fragments is mostly unobserved. As every HiDRA fragment has a unique identifier (genomic alignment position and random 4 nt barcode), long-read re-sequencing of the HiDRA library can assign SNP genotypes to fragments that were previously quantified for activity using short reads. b q-q plot for allelic imbalance at SNPs covered by HiDRA fragments. CENTIPEDE “effect” SNPs were identified by Moyerbrailean et al.. c “Effect” SNPs and SNPs within HiDRA active regions are more likely to be nominally significant for allelic imbalance. p-values from Fisher’s exact test. d The A allele of rs2382817, a SNP associated with inflammatory bowel disease, is more active in the HiDRA assay than the C allele. e Alelle-specific HiDRA activity signal tracks for rs2382817

References

    1. Nord AS, et al. Rapid and pervasive changes in genome-wide enhancer usage during mammalian development. Cell. 2013;155:1521–1531. doi: 10.1016/j.cell.2013.11.033. - DOI - PMC - PubMed
    1. Long HK, Prescott SL, Wysocka J. Ever-changing landscapes: transcriptional enhancers in development and evolution. Cell. 2016;167:1170–1187. doi: 10.1016/j.cell.2016.09.018. - DOI - PMC - PubMed
    1. Wamstad JA, Wang X, Demuren OO, Boyer LA. Distal enhancers: new insights into heart development and disease. Trends Cell Biol. 2014;24:294–302. doi: 10.1016/j.tcb.2013.10.008. - DOI - PubMed
    1. Wang X, et al. Discovery and validation of sub-threshold genome-wide association study loci using epigenomic signatures. eLife. 2016;5:e10557. doi: 10.7554/eLife.10557. - DOI - PMC - PubMed
    1. Ernst J, Kellis M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat. Biotechnol. 2010;28:817–825. doi: 10.1038/nbt.1662. - DOI - PMC - PubMed

Publication types