Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2015 Mar 16:2015:bav010.
doi: 10.1093/database/bav010. Print 2015.

Ontology application and use at the ENCODE DCC

Affiliations
Review

Ontology application and use at the ENCODE DCC

Venkat S Malladi et al. Database (Oxford). .

Abstract

The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a catalog of genomic annotations. To date, the project has generated over 4000 experiments across more than 350 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory network and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All ENCODE experimental data, metadata and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage and distribution to community resources and the scientific community. As the volume of data increases, the organization of experimental details becomes increasingly complicated and demands careful curation to identify related experiments. Here, we describe the ENCODE DCC's use of ontologies to standardize experimental metadata. We discuss how ontologies, when used to annotate metadata, provide improved searching capabilities and facilitate the ability to find connections within a set of experiments. Additionally, we provide examples of how ontologies are used to annotate ENCODE metadata and how the annotations can be identified via ontology-driven searches at the ENCODE portal. As genomic datasets grow larger and more interconnected, standardization of metadata becomes increasingly vital to allow for exploration and comparison of data between different scientific projects.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Experimental metadata annotated with appropriate ontology terms. This example, showing a subset of the full breadth of metadata annotated for an ENCODE experiment, emphasizes the annotation of three experimental metadata categories (treatment, biosample and assay) in two experiments. Treatments have been annotated to ChEBI (e.g. 20-hydroxyecdysone—CHEBI:16587 and estradiol—CHEBI:23965). Biosamples have been annotated to one of three ontologies: Uberon, CL and EFO (e.g. hepatic stellate cell—CL:0000632 and Hep-G2—EFO:0001187). Assays have been annotated to OBI (e.g. RRBS—OBI:0001862 and MeDIP-seq—OBI:000693). The terms in the middle are parent terms found in the ontology that provide a more general context and can be used to find these experiments (i.e. biological role for treatments, organ for biosamples, assay category for experimental assays). Each annotated term in the experiment maps, through relationships in the ontology, to the middle terms.
Figure 2.
Figure 2.
Graph view of integration of Uberon, CL and EFO. The graph view shows some of the relationship types and paths that can be traversed from child to parent terms. These relationships are either explicit or inferred. Explicit relationships are connections that are defined between two terms in the ontology. The integration of the three ontologies uses three relationships: is_a, part_of and derives_from. The is_a relationship indicates that one entity is a subtype of another entity (e.g. Hep-G2—EFO:0001187 is a type of hepatoma cell line—EFO:0005216). The part_of relationship indicates a part-whole relationship, such that an child term is fully and always contained within the parent term (e.g. all hepatocytes—CL:0000182 are found in the liver—UBERON:0002107). The derives_from relationship indicates that the child term succeeds parent term over some temporal divide, such that at least a significant biological portion is inherited (e.g. hepatoma cell lines—EFO:0005216 are cancerous hepatocyte cells—CL:0000182). Inferred relationships are connections between two terms that are transitively reasoned via the explicit relationships. Transitive relationships remain true across multiple links of the relationships. For example, as the Hep-G2 cell line is a type of hepatoma cell line that is derived from hepatocytes, an inferred relationship can be made that the Hep-G2 cell line is also derived from the hepatocytes.
Figure 3.
Figure 3.
Search at the ENCODE portal (https://www.encodeproject.org/). In this example, a free text search is done for ‘breast’. The user selects ‘Experiment’ for the ‘Data Type’ facet. The interface returns a list of various experiments (right column) that have been conducted on biosamples that match the search term. The search uses the annotated ontological term for the biosample, synonyms found in the ontology or inferred relationships to the ontological term breast—UBERON:0000310.
Figure 4.
Figure 4.
Filtering search results using facets. (A) A subset of facets for experimental assays is highlighted in the left column of the interface (https://www.encodeproject.org/search/?type=experiment). The ‘Assay’ facet displays the term name for annotations of an experiment to an OBI term id. ‘Experiment status’ indicates the state of the experiment record in the database. The ‘Organ’ facet represents the biosample slim described in the text describes the anatomical structure. The ‘Biosample treatment’ facet displays the term names for treatments, some of which are annotated to ChEBI. Lastly, the ‘Available data’ facet describes the data file types that are available for download from the ENCODE portal. (B) In this example, the user has expanded the ‘Organ’ facet and selected ‘brain’. To the right are all the available experiments on biological samples annotated to a specific term in Uberon, CL or EFO that slims to the parent term brain—UBERON:0000955.
Figure 5.
Figure 5.
Metadata integrity using facets in the curation interface. This view highlights two additional facets: ‘Assay category’ and ‘Metadata integrity checks’ found on the curator interface for experimental assays. The selected term filters the ‘Assay’ facet, based on the assay slim described in the text, to only display a list of assays that can be categorized as immunoprecipitation assays. For these experiments, the ‘Metadata integrity checks’ facet can be used to filter for the experiments that are missing antibody information.

References

    1. ENCODE Project Consortium. (2012) An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57–74. - PMC - PubMed
    1. Rosenbloom K.R., Sloan C.A., Malladi V.S., et al. (2013) ENCODE data in the UCSC genome browser: year 5 update. Nucleic Acids Res., 41(Database issue), D56–D63. - PMC - PubMed
    1. Mouse ENCODE Consortium, Stamatoyannopolous J.A., Snyder M., et al. (2012) An encyclopedia of mouse DNA elements (Mouse ENCODE). Genome Biol., 13, 418. - PMC - PubMed
    1. Ho J.W., Jung Y.L, Lui T., et al. . (2014) Comparative analysis of metazoan chromatin organization. Nature, 512, 449–452. - PMC - PubMed
    1. Boyle A.P., Araya C.L., Brdlik C., et al. (2014) Comparative analysis of regulatory information and circuits across distance species. Nature, 512, 453–456. - PMC - PubMed

Publication types