Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 6;51(D1):D753-D759.
doi: 10.1093/nar/gkac1080.

MGnify: the microbiome sequence data analysis resource in 2023

Affiliations

MGnify: the microbiome sequence data analysis resource in 2023

Lorna Richardson et al. Nucleic Acids Res. .

Abstract

The MGnify platform (https://www.ebi.ac.uk/metagenomics) facilitates the assembly, analysis and archiving of microbiome-derived nucleic acid sequences. The platform provides access to taxonomic assignments and functional annotations for nearly half a million analyses covering metabarcoding, metatranscriptomic, and metagenomic datasets, which are derived from a wide range of different environments. Over the past 3 years, MGnify has not only grown in terms of the number of datasets contained but also increased the breadth of analyses provided, such as the analysis of long-read sequences. The MGnify protein database now exceeds 2.4 billion non-redundant sequences predicted from metagenomic assemblies. This collection is now organised into a relational database making it possible to understand the genomic context of the protein through navigation back to the source assembly and sample metadata, marking a major improvement. To extend beyond the functional annotations already provided in MGnify, we have applied deep learning-based annotation methods. The technology underlying MGnify's Application Programming Interface (API) and website has been upgraded, and we have enabled the ability to perform downstream analysis of the MGnify data through the introduction of a coupled Jupyter Lab environment.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Overview of MGnify resources: the assembly and annotation of microbiome-derived sequences from a broad range of environments has given rise to new insights into microbial diversity and the functional repertoire they encode.
Figure 1.
Figure 1.
The number of assembled metagenomics datasets in the ENA and MGnify over time. MGnify launched assembly and analysis of assemblies in 2017, however counts of primary assemblies submitted to the ENA are only available from 2018 due to a change in recording. Until 1 August 2022, the MGnify team has generated and submitted an assembly for 88% of all primary assembled metagenomic datasets in the ENA.
Figure 2.
Figure 2.
A sample in MGnify that lacks structured geolocation information in the ENA. However a Contextual Data Clearing House curation is available, listing the country of origin as Japan.
Figure 3.
Figure 3.
Schematic of the protein database. Proteins are predicted on each contig (MGYC) using Prodigal (18) and FragGeneScan (19). The sequence and metadata of unique proteins (MGYP) are stored in a MySQL database. Annotations from Pfam (23) and ProtENN2 (Bileschi et al., in prep., (24)) for each protein are also stored.
Figure 4.
Figure 4.
The access options for users of MGnify's web resources: website, API, and notebooks server. The redesigned website now includes links to programmatically access datasets (in this example, a study) using the API. A conceptual flow for launching an R Notebook is shown: following a deep link from the website into the notebook server, and using one of the example code notebooks. In this comparative metagenomics example available on the server, taxonomic diversity is being compared at different water depths using multidimensional scaling (MDS) and a variety of distance metrics.

References

    1. Lobanov V., Gobet A., Joyce A.. Ecosystem-specific microbiota and microbiome databases in the era of big data. Environ. Microbiome. 2022; 17:37. - PMC - PubMed
    1. Mitchell A.L., Almeida A., Beracochea M., Boland M., Burgin J., Cochrane G., Crusoe M.R., Kale V., Potter S.C., Richardson L.J.et al. .. MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res. 2020; 48:D570–D578. - PMC - PubMed
    1. Crusoe M.R., Abeln S., Iosup A., Amstutz P., Chilton J., Tijanić N., Ménager H., Soiland-Reyes S., Gavrilović B., Goble C.et al. .. Methods included: standardizing computational reuse and portability with the common workflow language. Commun. ACM. 2022; 65:54–63.
    1. Goble C., Soiland-Reyes S., Bacall F., Owen S., Williams A., Eguinoa I., Droesbeke B., Leo S., Pireddu L., Rodríguez-Navas L.et al. .. Implementing FAIR Digital Objects in the EOSC-Life Workflow Collaboratory. Zenodo. 2021; 10.5281/zenodo.4605654. - DOI
    1. Tyson G.W., Chapman J., Hugenholtz P., Allen E.E., Ram R.J., Richardson P.M., Solovyev V.V., Rubin E.M., Rokhsar D.S., Banfield J.F.. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature. 2004; 428:37–43. - PubMed

Publication types