Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Oct 30;16(1):9592.
doi: 10.1038/s41467-025-64343-9.

Data navigation on the ENCODE portal

Affiliations

Data navigation on the ENCODE portal

Meenakshi S Kagda et al. Nat Commun. .

Abstract

Spanning two decades, the collaborative ENCODE project aims to identify all the functional elements within human and mouse genomes. To best serve the scientific community, the comprehensive ENCODE data including results from 23,000+ functional genomics experiments, 800+ functional elements characterization experiments and 60,000+ results from integrative computational analyses are available on an open-access data-portal ( https://www.encodeproject.org/ ). The final phase of the project includes data from several novel assays aimed at characterization and validation of genomic elements. In addition to developing and maintaining the data portal, the Data Coordination Center (DCC) implemented and utilised uniform processing pipelines to generate uniformly processed data. Here we report recent updates to the data portal including a redesigned home page, an improved search interface, new custom-designed pages highlighting biologically related datasets and an enhanced cart interface for data visualisation plus user-friendly data download options. A summary of data generated using uniform processing pipelines is also provided.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors have no competing interests.

Figures

Fig. 1
Fig. 1. Overview of different ENCODE functional genomics experiments.
The diversity and numbers of various ENCODE functional genomics experiments classified by (A) Assay type, (B) Biosample organism, (C) Biosample classification and awards are presented. As seen in Fig 1A, the highest number of ENCODE datasets include DNA binding based assays followed by Transcription based assays. Fig 1A and 1B further demonstrates that the maximum number of assays within the ENCODE corpus are composed of samples derived from Homo sapiens followed by Mus musculus. Fig 1C classifies the functional genomics experiments based on the different ENCODE phases as well as based on the biosample classification and organism type. As demonstrated in the figure, most functional genomics experiments within the ENCODE4 phase are encompassed of cell line based assays followed by tissue samples within the Homo sapiens category. In contrast, most of the Mus musculus based assays within the ENCODE3 and ENCODE4 phases have been performed on tissue samples.
Fig. 2
Fig. 2. ENCODE functional genomics experiments classified by assay types.
The figure provides a complete  breakdown of all the ENCODE functional genomics experiments classified by different assay types. As evident from the plot, the maximum diversity of ENCODE assays is found within the Transcription based assays which included 23 different assay types. In addition to transcription based assays, the various other assay types include 3D chromatin structure, DNA accessibility, DNA binding, DNA methylation, DNA sequencing, Genotyping, Proteomics, Replication timing, RNA binding, RNA structure, single cell (general), Single cell (DNA accessibility) as well as Single cell (Transcription).
Fig. 3
Fig. 3. Overview of the ENCODE functional  characterisation experiments.
The figure demonstrates the diversity of ENCODE functional characterisation experiments classified by organism, biosample classification, element selection method and various assays. As evident in the figure, the maximum number of functional characterisation experiments available on the ENCODE portal were performed on Homo sapiens derived cell lines which were assayed using CRISPR screen based assay type. In addition, several reporter based assays are also available from cell line samples originated from Homo sapiens. Moreover, some reporter assays were also performed on human organoids and primary cells. A handful of Mus musculus based functional characterisation experiments are also available from primary cells and cell lines.
Fig. 4
Fig. 4. ENCODE computational and integrative products (Annotations).
The figure demonstrates the total numbers of available computational and integrative products (annotation datasets) classified by organism, biosample classification and annotation type. As evident from the figure, the maximum number of annotations are available within the Imputation category derived from Homo sapiens samples. The candidate Cis-Regulatory elements (cCREs) form the basis of the various ENCYCLOPAEDIA versions.
Fig. 5
Fig. 5. ENCODE collections.
The figure provides a breakdown of the numbers of functional genomics experiments represented within various ENCODE collections classified by assay type and organism. As evident from the figure, the maximum number of assays can be found within the Human Donor matrix followed by the Human reference epigenomes matrix. Each collection has its own specific importance and more details about the same can be found in the Supplemental information. Note: the numbers reflected in this diagram exclude control ChIP-seq datasets.
Fig. 6
Fig. 6. ENCODE home page.
The figure demonstrates the various features of the ENCODE home page. The home page displays several clickable cards which provide shortcut links to different ENCODE datasets, collections as well as links to help pages. There is also a prominent search box feature which helps users type in keywords and search within the ENCODE datasets as well as the SCREEN registry. The home page also allows users to navigate different functionalities such as exploring the Cart features and the Sign-in options.
Fig. 7
Fig. 7. ENCODE Search box usage example.
This figure demonstrates what a user might see when trying to search “H3K4me3”. As seen in figure, when a user hits enter after typing the keyword, the search box highlights relevant homepage cards that have any data relavant to the keyword search term. In this case, 8 out of 12 cards are highlighted yellow, indicating the presence of H3K4me3 relevant data within those respective cards. The cards that are not highlighted have no metadata related to H3K4me3. In addition to the yellow highlights below the cards, notice the black box section listing the different object types below the search box along with a number indicating the number of specific objects having the keyword. In this example, we can notice that there are 40619 Files and 2389 Annotations that have data relevant to H3K4me3.

References

    1. Sloan, C. A. et al. ENCODE data at the ENCODE portal. Nucleic Acids Res.44, D726–D732 (2016). - PMC - PubMed
    1. Jou, J. et al. The ENCODE portal as an epigenomics resource. Curr. Protoc. Bioinforma.68, e89 (2019). - PMC - PubMed
    1. Luo, Y. et al. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal. Nucleic Acids Res.48, D882–D889 (2020). - PMC - PubMed
    1. Hitz, B. C. et al. SnoVault and encodeD: A novel object-based storage system and applications to ENCODE metadata. PLoS ONE12, e0175310 (2017). - PMC - PubMed
    1. The modENCODE Project. Genome.govhttps://www.genome.gov/26524507/the-modencode-project-model-organism-enc....

LinkOut - more resources