Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan 8;47(D1):D15-D22.
doi: 10.1093/nar/gky1124.

The European Bioinformatics Institute in 2018: tools, infrastructure and training

Affiliations

The European Bioinformatics Institute in 2018: tools, infrastructure and training

Charles E Cook et al. Nucleic Acids Res. .

Abstract

The European Bioinformatics Institute (https://www.ebi.ac.uk/) archives, curates and analyses life sciences data produced by researchers throughout the world, and makes these data available for re-use globally (https://www.ebi.ac.uk/). Data volumes continue to grow exponentially: total raw storage capacity now exceeds 160 petabytes, and we manage these increasing data flows while maintaining the quality of our services. This year we have improved the efficiency of our computational infrastructure and doubled the bandwidth of our connection to the worldwide web. We report two new data resources, the Single Cell Expression Atlas (https://www.ebi.ac.uk/gxa/sc/), which is a component of the Expression Atlas; and the PDBe-Knowledgebase (https://www.ebi.ac.uk/pdbe/pdbe-kb), which collates functional annotations and predictions for structure data in the Protein Data Bank. Additionally, Europe PMC (http://europepmc.org/) has added preprint abstracts to its search results, supplementing results from peer-reviewed publications. EMBL-EBI maintains over 150 analytical bioinformatics tools that complement our data resources. We make these tools available for users through a web interface as well as programmatically using application programming interfaces, whilst ensuring the latest versions are available for our users. Our training team, with support from all of our staff, continued to provide on-site, off-site and web-based training opportunities for thousands of researchers worldwide this year.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
EMBL-EBI data resources. To the left are deposition resources that store primary data submitted by research scientists as well as ontologies and literature resources that span the entire research effort. To the right of the arrow are added-value knowledgebases ordered according to biological scale, from genes to proteins, structures, chemistry and systems. Updates in this NAR Database issue include ArrayExpress (26), BioSamples (27), ChEMBL (28), Complex Portal (29), ENA (30), Ensembl (9), GenCode (31), GWAS Catalogue (32), HGNC (33), InterPro (8), Open Targets (34), PDXfinder (35), Pfam (7), PRIDE (36), RNACentral (37), SIFTS (6) and UniProt (38). Not all EMBL-EBI resources are shown on the figure. For a complete list see https://www.ebi.ac.uk/services.
Figure 2.
Figure 2.
(A) Data accumulation at EMBL-EBI by data type: nucleotide sequences, mass spectroscopy and microarray. (B) Data accumulation by archive: PRoteomics IDEntifications (PRIDE) (36), European Genome-Phenome Archive (EGA) (39), ArrayExpress (AE) (26), European Nucleotide Archive (ENA) (30), Protein Data Bank in Europe (PDBe) (40) and MetaboLights (16). The y-axis for both charts is logarithmic, so most data types are not just growing, but are growing at in increasing rate. In all data resources shown here growth rates are predicted to continue increasing, with notable sustained exponential growth in PRIDE and MetaboLights.
Figure 3.
Figure 3.
Installed raw storage at EMBL-EBI. The chart shows total installed data storage at EMBL-EBI, including multiple backups for all data resources as well as unused space to handle submissions in the immediate future. The total volume of a single copy of all data resources is roughly 20–25% of the installed storage capacity. Data points (not shown) are the end of each calendar year, thus the range of the x-axis is 31 December 2012 through 31 December 2018. Data for end of 2018 are projected based on planned procurement. In 2017, we procured a high volume of disk space at good value that increased capacity substantially, requiring relatively less procurement in 2018. This approach allowed us to utilize our infrastructure budget efficiently.

References

    1. Chojnacki S., Cowley A., Lee J., Foix A., Lopez R.. Programmatic access to bioinformatics tools from EMBL-EBI update: 2017. Nucleic Acids Res. 2017; 45:W550–W553. - PMC - PubMed
    1. Cook C.E., Bergman M.T., Cochrane G., Apweiler R., Birney E.. The European Bioinformatics Institute in 2017: data coordination and integration. Nucleic Acids Res. 2018; 46:D21–D29. - PMC - PubMed
    1. Anderson W.P., Global Life Science Data Resources Working, G. Data management: a global coalition to sustain core data. Nature. 2017; 543:179. - PubMed
    1. Cook C.E., Bergman M.T., Finn R.D., Cochrane G., Birney E., Apweiler R.. The European Bioinformatics Institute in 2016: Data growth and integration. Nucleic Acids Res. 2016; 44:D20–D26. - PMC - PubMed
    1. Park Y.M., Squizzato S., Buso N., Gur T., Lopez R.. The EBI search engine: EBI search as a service-making biological data accessible for all. Nucleic Acids Res. 2017; 45:W545–W549. - PMC - PubMed

Publication types