Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan 8;48(D1):D17-D23.
doi: 10.1093/nar/gkz1033.

The European Bioinformatics Institute in 2020: building a global infrastructure of interconnected data resources for the life sciences

Affiliations

The European Bioinformatics Institute in 2020: building a global infrastructure of interconnected data resources for the life sciences

Charles E Cook et al. Nucleic Acids Res. .

Abstract

Data resources at the European Bioinformatics Institute (EMBL-EBI, https://www.ebi.ac.uk/) archive, organize and provide added-value analysis of research data produced around the world. This year's update for EMBL-EBI focuses on data exchanges among resources, both within the institute and with a wider global infrastructure. Within EMBL-EBI, data resources exchange data through a rich network of data flows mediated by automated systems. This network ensures that users are served with as much information as possible from any search and any starting point within EMBL-EBI's websites. EMBL-EBI data resources also exchange data with hundreds of other data resources worldwide and collectively are a key component of a global infrastructure of interconnected life sciences data resources. We also describe the BioImage Archive, a deposition database for raw images derived from primary research that will supply data for future knowledgebases that will add value through curation of primary image data. We also report a new release of the PRIDE database with an improved technical infrastructure, a new API, a new webpage, and improved data exchange with UniProt and Expression Atlas. Training is a core mission of EMBL-EBI and in 2018 our training team served more users, both in-person and through web-based programmes, than ever before.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Propagation of open data through the life sciences data infrastructure. An annotated sequence from a newly isolated species autonomously triggers a flow of protein-coding genes into UniProtKB (15), which in turn will propagate data to build sequence family models in Pfam (16) for use in InterPro (17), providing open tools for the functional exploration of further sequences. This example shows only EMBL-EBI resources, but similar data flows occur throughout the entire global infrastructure, as illustrated by the data exchange pathways in Figure 3.
Figure 2.
Figure 2.
Data exchange between data resources at EMBL-EBI. The dataset contains 911 separate data connections between 41 of EMBL-EBI’s resources. Resources on the circumference of the circle are connected to each other with an internal arc whose width represents the total number of different interactions between the resources. Arc widths are proportional to the number of data connections and do not represent volume of data exchanged. Resources are grouped around the circle by functional cluster and distinguished by colour. Internal arc colours identify each cluster and do not reflect the direction of data exchange. The graphic was generated using the D3 JavaScript library (http://d3js.org) and the dataset was gathered as part of an external review in July 2018.
Figure 3.
Figure 3.
Data exchange between EMBL-EBI resources and external data resources. This Sankey chart shows 468 separate external data resources that are linked to 39 EMBL-EBI resources by 1001 separate data connections. The graphic was generated using Tableau (www.tableau.com) from data gathered as part of an external review in July 2018. For the full dataset, showing all 468 external resources, see supplementary material.
Figure 4.
Figure 4.
Data accumulation at EMBL-EBI by data resource over time. The y-axis shows total bytes for a single copy of the data resource over time. Resources shown are the BioImage Archive, PRoteomics IDEntifications (PRIDE) (8), European Genome-Phenome Archive (EGA) (14), ArrayExpress (18), European Nucleotide Archive (ENA) (19), Protein Data Bank in Europe (20) and MetaboLights (21). The y-axis for both charts is logarithmic, so not only are most data types growing, but the rate of growth is also increasing. For all data resources shown here growth rates are predicted to continue increasing. The dataset used to generate the figure is available in supplementary material.

References

    1. Cook C.E., Lopez R., Stroe O., Cochrane G., Brooksbank C., Birney E., Apweiler R.. The european bioinformatics institute in 2018: tools, infrastructure and training. Nucleic Acids Res. 2019; 47:D15–D22. - PMC - PubMed
    1. Vamathevan J., Apweiler R., Birney E.. Biomolecular data resources: Bioinformatics infrastructure for biomedical data science. Annu. Rev. Biomed. Data Sci. 2019; 2:199–222.
    1. Wilkinson M.D., Dumontier M., Aalbersberg I.J., Appleton G., Axton M., Baak A., Blomberg N., Boiten J.W., da Silva Santos L.B., Bourne P.E. et al. .. The FAIR guiding principles for scientific data management and stewardship. Sci. Data. 2016; 3:160018. - PMC - PubMed
    1. Tarkowska A., Carvalho-Silva D., Cook C.E., Turner E., Finn R.D., Yates A.D.. Eleven quick tips to build a usable REST API for life sciences. PLoS Comput. Biol. 2018; 14:e1006542. - PMC - PubMed
    1. Iudin A., Korir P.K., Salavert-Torres J., Kleywegt G.J., Patwardhan A.. EMPIAR: a public archive for raw electron microscopy image data. Nat. Methods. 2016; 13:387–388. - PubMed

Publication types

MeSH terms

LinkOut - more resources