Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Nov 23;3(5):491-495.e5.
doi: 10.1016/j.cels.2016.10.021. Epub 2016 Nov 15.

The BLUEPRINT Data Analysis Portal

Affiliations

The BLUEPRINT Data Analysis Portal

José María Fernández et al. Cell Syst. .

Abstract

The impact of large and complex epigenomic datasets on biological insights or clinical applications is limited by the lack of accessibility by easy, intuitive, and fast tools. Here, we describe an epigenomics comparative cyber-infrastructure (EPICO), an open-access reference set of libraries to develop comparative epigenomic data portals. Using EPICO, large epigenome projects can make available their rich datasets to the community without requiring specific technical skills. As a first instance of EPICO, we implemented the BLUEPRINT Data Analysis Portal (BDAP). BDAP provides a desktop for the comparative analysis of epigenomes of hematopoietic cell types based on results, such as the position of epigenetic features, from basic analysis pipelines. The BDAP interface facilitates interactive exploration of genomic regions, genes, and pathways in the context of differentiation of hematopoietic lineages. This work represents initial steps toward broadly accessible integrative analysis of epigenomic data across international consortia. EPICO can be accessed at https://github.com/inab, and BDAP can be accessed at http://blueprint-data.bsc.es.

Keywords: BLUEPRINT Data Analysis Portal; BLUEPRINT epigenomes; International Human Epigenomes Consortium; bioinformatics; cyber-infrastructure; epigenomic data mining tools; epigenomic data visualization; epigenomics; hematopoiesis; leukemia.

PubMed Disclaimer

Figures

Figure 1
Figure 1. EPICO infrastructure flowchart
A. Each epigenomic data set usually has its own file formats and conventions, so this step is custom. B. EPICO data model concepts, ontologies and restrictions are common. Only details like the versions of reference EnsEMBL, GENCODE, GRCh and other primary database resources or project name have to be tweaked. C. As genomic definitions and annotations are published in common sites, and their data formats are stable from release to release, this step is done by EPICO. D. The metadata and data insertion (which should be following the EPICO data model at this point) is composed by several steps, all of them generic: data validation and normalization (1) using EPICO libraries (2), which later translate it into the dependent database model (3) (currently supported relational, MongoDB and Elasticsearch). In the case of BLUEPRINT we have used Elasticsearch. E. The data is massively inserted into the database, which already contains the database definitions mapped from the EPICO data model, as well as the ontologies, and the genomic coordinates of the known features, like genes, transcripts, direct complexes, reactions and pathways. F. BLUEPRINT Data Analysis Portal prior to version 1.0 was issuing its queries to the read-only instance of Elasticsearch which contained all the BLUEPRINT metadata + primary analysis data. G. BDAP 1.0 issue its queries to the EPICO REST API, which manages the different databases, and implements the queries to Elasticsearch. EPICO Data Analysis Portal is going to be a superset of BDAP, able to work with one or more project data sets at once. Data from different epigenomic projects usually cannot be mixed on comparisons, due different experimental, normalization and analysis protocols.

References

    1. Adams D, Altucci L, Antonarakis SE, Ballesteros J, Beck S, Bird A, Bock C, Boehm B, Campo E, Caricasole A, et al. BLUEPRINT to decode the epigenetic signature written in blood. Nat Biotechnol. 2012;30(3):224–226. - PubMed
    1. Albrecht F, List M, Bock C, Lengauer T. DeepBlue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets. Nucleic Acids Research. 2016 doi: 10.1093/nar/gkw211. - DOI - PMC - PubMed
    1. Bard JL, Rhee SY, Ashburner M. An ontology for cell types. Genome Biol. 2005;6(2):R21. - PMC - PubMed
    1. BP-analysis. BLUEPRINT Analysis descriptions release 20160816. 2016 ftp://ftp.ebi.ac.uk/pub/databases/blueprint/releases/20160816/homo_sapiens/
    1. BP-Data_analysis. BLUEPRINT Data Analysis Portal GitHub repository. 2016 https://github.com/inab/epico-data-analysis-portal.

Publication types