Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

Perspectives on ENCODE

ENCODE Project Consortium et al. Nature. 2020 Jul.

Erratum in

  • Author Correction: Perspectives on ENCODE.
    ENCODE Project Consortium; Snyder MP, Gingeras TR, Moore JE, Weng Z, Gerstein MB, Ren B, Hardison RC, Stamatoyannopoulos JA, Graveley BR, Feingold EA, Pazin MJ, Pagan M, Gilchrist DA, Hitz BC, Cherry JM, Bernstein BE, Mendenhall EM, Zerbino DR, Frankish A, Flicek P, Myers RM. ENCODE Project Consortium, et al. Nature. 2022 May;605(7909):E4. doi: 10.1038/s41586-021-04213-8. Nature. 2022. PMID: 35474002 Free PMC article. No abstract available.

Abstract

The Encylopedia of DNA Elements (ENCODE) Project launched in 2003 with the long-term goal of developing a comprehensive map of functional elements in the human genome. These included genes, biochemical regions associated with gene regulation (for example, transcription factor binding sites, open chromatin, and histone marks) and transcript isoforms. The marks serve as sites for candidate cis-regulatory elements (cCREs) that may serve functional roles in regulating gene expression1. The project has been extended to model organisms, particularly the mouse. In the third phase of ENCODE, nearly a million and more than 300,000 cCRE annotations have been generated for human and mouse, respectively, and these have provided a valuable resource for the scientific community.

PubMed Disclaimer

Conflict of interest statement

B.E.B. declares outside interests in Fulcrum Therapeutics, 1CellBio, HiFiBio, Arsenal Biosciences, Cell Signaling Technologies, BioMillenia, and Nohla Therapeutics. P.F. is a member of the Scientific Advisory Boards of Fabric Genomics, Inc. and Eagle Genomics, Ltd. M.P.S. is cofounder and scientific advisory board member of Personalis, SensOmics, Mirvie, Qbio, January, Filtricine, and Genome Heart. He serves on the scientific advisory board of these companies and Genapsys and Jupiter. Z.W. is a cofounder of Rgenta Therapeutics and she serves on its scientific advisory board. R.M.M. is an advisor to DNAnexus and Decheng Capital, and has outside interests in IMIDomics, Accuragen and ReadCoor, Inc. The authors declare no other competing financial interests.

Figures

Fig. 1
Fig. 1. ENCODE assays by year.
Accumulations of assays over the three phases of ENCODE. 3D chromatin structure includes ChIA-PET (62 experiments), Hi-C (31), and chromatin conformation capture carbon copy (5C, 13). Chromatin accessibility includes DNAase-seq (524), assay for transposase-accessible chromatin using sequencing (ATAC-seq, 129), transcription activator-like effector nuclease (TALEN)-modified DNAase-seq (40), formaldehyde-assisted isolation of regulator elements with sequencing (FAIRE-seq, 37) and micrococcal nuclease digestion with deep sequencing (MNase-seq, 2). DNA methylation includes DNAme arrays (259), WGBS (124), reduced-representation bisulfite sequencing (RRBS, 103), methylation-sensitive restriction enzyme sequencing (MRE-seq, 24) and methylated DNA immunoprecipitation coupled with next-generation sequencing (MeDIP-seq, 4). Histone modification includes ChIP–seq (1,605) on histone and modified histone targets. Knockdown transcription includes RNA-seq preceded by small interfering RNA (siRNA, 54), short hairpin RNA (shRNA, 531), clustered regularly interspaced short palindromic repeats (CRISPR, 50) or CRISPR interference (CRISPRi, 77). RNA binding includes enhanced cross-linking immunoprecipitation (eCLIP, 349), RNA bind-n-seq (158), RNA immunoprecipitation sequencing (RIP-seq, 158), RNA-binding protein immunoprecipitation-microarray profiling (RIP-chip, 32), individual nucleotide-resolution CLIP (iCLIP, 6) and Switchgear (2). Transcription includes RNA annotation and mapping of promoters for the analysis of gene expression (RAMPAGE, 155), cap analysis gene expression (CAGE, 78), RNA paired-end tag (RNA-PET, 31), microRNA-seq (114), microRNA counts (114), more classical RNA-seq (900) and RNA-microarray (170), including 112 experiments at single-cell resolution. Transcription factor (TF) binding is ChIP–seq on non-histone targets (2,443). Other assays include genotyping array (123), nascent DNA replication strand sequencing (Repli-seq, 104), replication strand arrays (Repli-chip, 63), tandem mass spectrometry (MS/MS, 14), genotyping by high-throughput sequencing (genotyping HTS, 12) and DNA-PET (6) can be looked at in detail at https://www.encodeproject.org.
Fig. 2
Fig. 2. Progress in annotating the human genome.
Link to high-resolution PDF file: https://www.dropbox.com/s/rjdrcqygz15p034/perspective.pdf?dl=0. a, Improvement of gene annotations in the past 15 years by GENCODE, an international gene annotation group that uses ENCODE data. b, ENCODE annotations in 2012 with phase II data. Bars show the percentages of the mappable human genome (3.1 billion nucleotides; hg19) that were annotated as open chromatin by DNase-seq data, enriched in four types of active histone mark according to ChIP–seq data, and annotated as transcription factor binding sites (TFBSs) according to ChIP–seq data. Also shown are percentages of the genome assigned as transcription start sites (TSSs), enhancers and the insulator-binding protein (CTCF) by combining ChromHMM and Segway genome segmentations. c, ENCODE annotations in 2019 with ENCODE 2, Roadmap, and ENCODE 3 data. The registry of cCREs developed during phase III defines 0.3%, 1.1%, 5.8%, 0.2% and 0.4% of the human genome as cCREs with promoter-like signatures (PLS), proximal enhancer-like signatures (pELS), distal enhancer-like signatures (dELS), with high DNase, high H3K4me3 and low H3K27ac signals (DNase-H3K4me3), and bound by CTCF, respectively. d, A UCSC genome browser view of GENCODE genes (V7) coloured by transcript annotation (blue for coding, green for noncoding, and red for problematic) and combined genome segmentation (TSSs in red, enhancers in orange, weak enhancers in yellow, transcription in green, repressed in grey) at the CTCF locus on the hg19 human genome. e, The UCSC genome browser view of GENCODE genes (V28, coloured as in d) and cCREs at the CTCF locus on the hg38 human genome. Promoter-like, enhancer-like, and CTCF-only cCREs annotated in B cells are in red, yellow, and blue, respectively. The last four tracks show the DNase, H3K4me3, H3K27ac, and CTCF signals in B cells.
Fig. 3
Fig. 3. Publications using ENCODE data.
The National Human Genome Research Institute (NHGRI) has identified a list of publications that used ENCODE data. This list is publicly shared to provide examples illustrating how the resource has been used (https://www.encodeproject.org/publications/). a, Publications over time. Community publications appear to use ENCODE data and do not report ENCODE grant support in PubMed; consortium publications report ENCODE grant support in PubMed. In brief, community publications are identified using two steps; first, candidates are identified through automated searches for citation of ENCODE accession numbers, ENCODE flagship papers, or resources such as HaploReg and RegulomeDB; second, candidates are manually evaluated to determine whether ENCODE data were actually used. Consortium papers are identified through automated searches of PubMed for publications that were supported at least in part by ENCODE awards, and are not further evaluated or annotated. b, Human disease example publications. The subset of community publications that were annotated as ‘human disease’ (other categories are basic biology, software tool, fly/worm data) were further manually categorized by disease aetiology.
Fig. 4
Fig. 4. An overview of the mouse ENCODE Project in the current phase.
a, Schematic representation of ENCODE 3 mouse developmental data series. The chromatin graphic is adapted from an image by Darryl Leja (NHGRI), Ian Dunham (EBI), and M.J.P. (NHGRI). The embryo image second from the right in was adapted from ref. , an Open Access article distributed under the terms of the Creative Commons Attribution License 2.0. b, Three major axes of the data series: assays, tissues, and developmental stages. The region shown is chr11:98,307,637–98,344,383, mm10. c, A schematic diagram of the transgenic assays used to validate and characterize the function of cCREs in E11.5 and E12.5 mouse embryos. The cCREs were selected on the basis of ChIP–seq data and cloned into a reporter vector that was then introduced into fertilized mouse eggs. The activities of the CRE were validated by tissue-specific expression patterns of the reporter gene. d, Results from recent transgenic assays, to validate about 400 cCREs are summarized in a barchart, with the bars indicating the proportion of candidate CREs in each rank tier that showed reproducible reporter staining in the expected tissue (grey) or any tissue (pink).
Extended Data Fig. 1
Extended Data Fig. 1. ENCODE timeline.
Pilot phase: September 2003–September 2007; ENCODE 2: September 2007–September 2012; ENCODE 3: September 2012–January 2017; ENCODE 4: February 2017–present; modENCODE: April 2007–April 2012; mouse ENCODE: 2009–2012.

References

    1. Kellis M, et al. Defining functional DNA elements in the human genome. Proc. Natl Acad. Sci. USA. 2014;111:6131–6138. doi: 10.1073/pnas.1318948111. - DOI - PMC - PubMed
    1. ENCODE Project Consortium The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004;306:636–640. doi: 10.1126/science.1105136. - DOI - PubMed
    1. Lindblad-Toh K, et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature. 2011;478:476–482. doi: 10.1038/nature10530. - DOI - PMC - PubMed
    1. Waterston RH, et al. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420:520–562. doi: 10.1038/nature01262. - DOI - PubMed
    1. ENCODE Project Consortium A user’s guide to the encyclopedia of DNA elements (ENCODE) PLoS Biol. 2011;9:e1001046. doi: 10.1371/journal.pbio.1001046. - DOI - PMC - PubMed

Publication types

Grants and funding