Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Dec 26:2024.12.26.629296.
doi: 10.1101/2024.12.26.629296.

An Expanded Registry of Candidate cis-Regulatory Elements for Studying Transcriptional Regulation

Affiliations

An Expanded Registry of Candidate cis-Regulatory Elements for Studying Transcriptional Regulation

Jill E Moore et al. bioRxiv. .

Abstract

Mammalian genomes contain millions of regulatory elements that control the complex patterns of gene expression. Previously, The ENCODE consortium mapped biochemical signals across many cell types and tissues and integrated these data to develop a Registry of 0.9 million human and 300 thousand mouse candidate cis-Regulatory Elements (cCREs) annotated with potential functions1. We have expanded the Registry to include 2.35 million human and 927 thousand mouse cCREs, leveraging new ENCODE datasets and enhanced computational methods. This expanded Registry covers hundreds of unique cell and tissue types, providing a comprehensive understanding of gene regulation. Functional characterization data from assays like STARR-seq, MPRA, CRISPR perturbation, and transgenic mouse assays now cover over 90% of human cCREs, revealing complex regulatory functions. We identified thousands of novel silencer cCREs and demonstrated their dual enhancer/silencer roles in different cellular contexts. Integrating the Registry with other ENCODE annotations facilitates genetic variation interpretation and trait-associated gene identification, exemplified by discovering KLF1 as a novel causal gene for red blood cell traits. This expanded Registry is a valuable resource for studying the regulatory genome and its impact on health and disease.

PubMed Disclaimer

Conflict of interest statement

Ethics declarations (competing interests) J.M.E. is an inventor on patents and patent applications related to CRISPR screening technologies, has received materials from 10× Genomics unrelated to this study, and has received speaking honoraria from GSK plc. B.E.B. discloses financial interests in HiFiBio, Arsenal Biosciences, Chroma Medicine, Cell Signaling Technologies and Design Pharmaceuticals. M.P.S. is a co-founder and on the advisory boards of Personalis, Qbio, January AI, SensOmics, Filtricine, Protos, Mirvie, Onza, Marble Therapeutics, Iollo, and NextThought AI. He is also on the advisory boards of Jupiter, Applied Cognition, Neuvivo, Mitrix, and Enovone. A.K. is a consulting fellow with Illumina; a member of the SABs of OpenTargets (GSK), PatchBio, and SerImmune; and a co-founder of RavelBio. Z. W. is a cofounder of Rgenta Therapeutics and serves on its scientific advisory board. The other authors declare no competing interests.

Figures

Figure 1 |
Figure 1 |. The updated Registry of candidate cis-Regulatory elements.
a, Schematic of the pipeline used to make Version 4 of the Registry of cCREs. We define element anchors by generating representative DHSs (rDHSs) and transcription factor clusters. Element anchors are scored with H3K4me3, H3K27ac, and CTCF ChIP-seq and ATAC-seq signals (yellow box) and classified according to the scheme in b. This results in 2.3 million cCREs in the human genome and 927 thousand in the mouse genome. We supplement the Registry with additional ENCODE Encyclopedia annotations including transcription quantifications, 3D chromatin contacts, functional characterization measurements, sequence features, and genetic variation (blue box). The Registry of cCREs and all layered annotations are housed in our web portal SCREEN. New components of the pipeline are denoted by stars. b, Overview of our cCRE classification scheme. cCREs are classified based on their patterns of biochemical signals (chromatin accessibility in green, H3K4me3 in red, H3K27ac in yellow, CTCF in blue, transcription factor in purple) and distance from annotated TSSs. High signals are denoted by peaks. A +/− symbol indicates that the corresponding signal may or may not be present and its presence does not impact classification. New categories of elements are denoted by stars. c, Bar graphs depicting the number of cCREs annotated in each class for human (left) and mouse (right). The gray hatched bar indicates an upper bound for the number of CA cCREs in mouse that would be classified as enhancers if H3K27ac data were available.
Figure 2 |
Figure 2 |. Functional characterization of the Registry of cCREs.
a, Summary of cCREs tested by ENCODE4 functional characterization assays. b, Schematic of the CAPRA quantification method which utilizes solo fragments (overlapping single cCREs in their entirety, blue) and double fragments (overlapping two cCREs in their entirety, purple). c, Density plot showing the distributions of CAPRA quantifications in K562 cells for K562 promoter (red), distal enhancer (yellow) or low chromatin accessibility (gray) cCREs. d, Scatterplot of CAPRA quantifications for distal enhancer cCREs in K562 (x-axis) and HepG2 (y-axis). Color of points indicates cCREs with enriched activity (STARR+) in K562 (pink) or HepG2 (green). e, Barplots of motif enrichment for HepG2 (green) or K562 (pink) STARR+ distal enhancers (as defined in d). Top five motifs are shown for each group of cCREs along with their corresponding logo. f, Genome browser view of three distal enhancer cCREs (denoted by 1–3) in the MTNR1A intron with DNase (green) and H3K27ac (yellow) signals in K562. A STARR-seq peak call is shown in black. g, CAPRA quantifications for the three enhancers shown in f: EH38E3620077 (1), EH38E3620078 (2) and EH38E3620079 (3) using solo fragments (top) and double fragments (bottom). High quantifications are denoted in purple (p = 0.03). h, Overlap of common K562 transcription factor motifs at the three enhancers in f and g. Representative motif logos for EH38E3620078 and EH38E3620079 are shown.
Figure 3 |
Figure 3 |. Identification of distinct functional categories of REST-bound cCREs.
a, Computational pipeline for identifying REST-bound cCREs (REST+ cCREs). We overlapped cCREs with REST ChIP-seq peaks and selected all cCREs that overlap at least five peak summits and an annotated REST motif instance, resulting in 5,850 REST+ cCREs. b, Barplots depicting the number of REST+ cCREs stratified by cCRE class. c, Barplots depicting the enrichment for cCRE classes of REST+ cCREs compared to the entire Registry. d, Workflow for functionally characterizing REST+ cCREs. e, Representative result from the mouse transgenic enhancer assay showing activity of REST+ enhancer cCRE EH38E1910506 in mouse brain regions. f, Bar graphs denoting the percentage of regions tested in transgenic mouse enhancer assays with positive activity. Regions are stratified into four groups based on cCRE classification (distal enhancer in yellow, CA-TF in purple) and REST binding (+ and dark bars indicate REST+, Ư and light bars indicate RESTƯ). ** denotes a Fisher Exact Test p-value less than 0.01. g, Bar graphs showing the percentage of distal enhancer cCREs with transgenic mouse enhancer assay activity in specific tissues stratified by REST binding (as defined in f). * denotes a Fisher Exact Test p-value less than 0.05. h, Density plot of the distributions of STARR scores calculated by CAPRA for all cCREs (black), REST+ distal enhancer cCREs (yellow), and REST+ CA-TF cCREs (purple). Both groups of REST+ cCREs have median STARR scores less than zero, suggesting silencer activity for both groups. P-value is calculated using a Wilcoxon test.
Figure 4 |
Figure 4 |. Using the Registry of cCREs to study transcriptional regulation.
a, Genome browser view of three SNPs associated with red blood cell traits, rs7255045, rs11672387 and rs7255045, and nearby genes. Biochemical signals from K562 cells are shown: DNase in light green, H3K4me3 in red, H3K27ac in yellow, CTCF in blue, and RNA-seq in dark green, along with cCRE classifications in K562. b, Circle-barplots denoting the enrichment of variants associated with red blood cell traits in cCREs with high H3K27ac signals in specific cell and tissue types. Length of line denotes the log2 fold enrichment over control variants and size of the terminal circle indicates statistical significance of enrichment. Color of line denotes tissue/organ of origin of biosample. A representative set of biosamples is shown with one cell line and one tissue sample for each non-blood tissue/organ. c, Heatmap depicting the expression level of genes linked and/or proximal to variants associated with red blood cell traits across the same biosamples as b.

References

    1. ENCODE Project Consortium et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020). - PMC - PubMed
    1. Kim S. & Wysocka J. Deciphering the multi-scale, quantitative cis-regulatory code. Mol. Cell 83, 373–392 (2023). - PMC - PubMed
    1. Fan K., Pfister E. & Weng Z. Toward a comprehensive catalog of regulatory elements. Hum. Genet. 142, 1091–1111 (2023). - PubMed
    1. Lee T. I. & Young R. A. Transcriptional regulation and its misregulation in disease. Cell 152, 1237–1251 (2013). - PMC - PubMed
    1. Levine M., Cattoglio C. & Tjian R. Looping back to leap forward: transcription enters a new era. Cell 157, 13–25 (2014). - PMC - PubMed

Publication types

LinkOut - more resources