Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 5;52(D1):D92-D97.
doi: 10.1093/nar/gkad1067.

The European Nucleotide Archive in 2023

Affiliations

The European Nucleotide Archive in 2023

David Yuan et al. Nucleic Acids Res. .

Abstract

The European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena) is maintained by the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI). The ENA is one of the three members of the International Nucleotide Sequence Database Collaboration (INSDC). It serves the bioinformatics community worldwide via the submission, processing, archiving and dissemination of sequence data. The ENA supports data types ranging from raw reads, through alignments and assemblies to functional annotation. The data is enriched with contextual information relating to samples and experimental configurations. In this article, we describe recent progress and improvements to ENA services. In particular, we focus upon three areas of work in 2023: FAIRness of ENA data, pandemic preparedness and foundational technology. For FAIRness, we have introduced minimal requirements for spatiotemporal annotation, created a metadata-based classification system, incorporated third party metadata curations with archived records, and developed a new rapid visualisation platform, the ENA Notebooks. For foundational enhancements, we have improved the INSDC data exchange and synchronisation pipelines, and invested in site reliability engineering for ENA infrastructure. In order to support genomic surveillance efforts, we have continued to provide ENA services in support of SARS-CoV-2 data mobilisation and have adapted these for broader pathogen surveillance efforts.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
Monthly annualised growth % of studies, samples, genomes and the total data volume between 2019 and 2023 using logarithmic scale. The genome growth peak coincides with a surge in SARS-CoV-2 sequencing.
Figure 2.
Figure 2.
Interaction between SARS-CoV-2 pipelines and ENA is enabled via public APIs only. The ENA public APIs are able to meet the demand for the COVID-19 pandemic. This proves that the ENA public APIs are prepared to support external analysis systems for pandemic at a very large scale with high throughput in the future.

References

    1. Wilkinson M.D., Dumontier M., Aalbersberg I.J., Appleton G., Axton M., Baak A., Blomberg N., Boiten J.-W., Da Silva Santos L.B., Bourne P.E.et al. .. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data. 2016; 3:160018. - PMC - PubMed
    1. Arita M., Karsch-Mizrachi I., Cochrane G.. The international nucleotide sequence database collaboration. Nucleic Acids Res. 2021; 49:D121–D124. - PMC - PubMed
    1. Sayers E.W., Cavanaugh M., Clark K., Pruitt K.D., Schoch C.L., Sherry S.T., Karsch-Mizrachi I.. GenBank. Nucleic Acids Res. 2021; 49:D92–D96. - PMC - PubMed
    1. Ogasawara O., Kodama Y., Mashima J., Kosuge T., Fujisawa T.. DDBJ database updates and computational infrastructure enhancement. Nucleic Acids Res. 2019; 48:D45–D50. - PMC - PubMed
    1. Drysdale R., Cook C.E., Petryszak R., Baillie-Gerritsen V., Barlow M., Gasteiger E., Gruhl F., Haas J., Lanfear J., Lopez R.et al. .. The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences. Bioinformatics. 2020; 36:2636–2642. - PMC - PubMed