Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 18;17(1):57.
doi: 10.1186/s40793-022-00449-7.

MarineMetagenomeDB: a public repository for curated and standardized metadata for marine metagenomes

Affiliations

MarineMetagenomeDB: a public repository for curated and standardized metadata for marine metagenomes

Muhammad Kabiru Nata'ala et al. Environ Microbiome. .

Abstract

Background: Metagenomics is an expanding field within microbial ecology, microbiology, and related disciplines. The number of metagenomes deposited in major public repositories such as Sequence Read Archive (SRA) and Metagenomic Rapid Annotations using Subsystems Technology (MG-RAST) is rising exponentially. However, data mining and interpretation can be challenging due to mis-annotated and misleading metadata entries. In this study, we describe the Marine Metagenome Metadata Database (MarineMetagenomeDB) to help researchers identify marine metagenomes of interest for re-analysis and meta-analysis. To this end, we have manually curated the associated metadata of several thousands of microbial metagenomes currently deposited at SRA and MG-RAST.

Results: In total, 125 terms were curated according to 17 different classes (e.g., biome, material, oceanic zone, geographic feature and oceanographic phenomena). Other standardized features include sample attributes (e.g., salinity, depth), sample location (e.g., latitude, longitude), and sequencing features (e.g., sequencing platform, sequence count). MarineMetagenomeDB version 1.0 contains 11,449 marine metagenomes from SRA and MG-RAST distributed across all oceans and several seas. Most samples were sequenced using Illumina sequencing technology (84.33%). More than 55% of the samples were collected from the Pacific and the Atlantic Oceans. About 40% of the samples had their biomes assigned as 'ocean'. The 'Quick Search' and 'Advanced Search' tabs allow users to use different filters to select samples of interest dynamically in the web app. The interactive map allows the visualization of samples based on their location on the world map. The web app is also equipped with a novel download tool (on both Windows and Linux operating systems), that allows easy download of raw sequence data of selected samples from their respective repositories. As a use case, we demonstrated how to use the MarineMetagenomeDB web app to select estuarine metagenomes for potential large-scale microbial biogeography studies.

Conclusion: The MarineMetagenomeDB is a powerful resource for non-bioinformaticians to find marine metagenome samples with curated metadata and stimulate meta-studies involving marine microbiomes. Our user-friendly web app is publicly available at https://webapp.ufz.de/marmdb/ .

Keywords: Database; Marine microbiomes; Metadata; Metagenomics; Microbial ecology.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Overview of the MarineMetagenomeDB construction workflow. A Metadata retrieval for both SRA and MG-RAST samples was achieved using different pipelines, as explained in the text. Followed by the removal of non-WGS and non-marine samples. B Standardization of attributes. Sample attributes including ‘Dates’, ‘Location’ and ‘Depth’ were standardized (C) Identification of marine terms. Marine terms were adapted from Marine Biome, Environmental Material, and Geographic Feature of the Environmental Ontology (ENVO). An example of a collection of terms grouped as ‘constructed structures’ (labeled with a double asterisk) can be found in the Additional file (see Additional file 3: Table S3) (‘MarMDB_constructed_structures’) (D) Merging SRA and MG-RAST dataset. The MG-RAST attributes were adapted to the SRA metadata standard. E MarineMetagenomeDB is made available online through a shiny web implementation. Adapted from [17, 18]
Fig. 2
Fig. 2
Descriptive statistics of the MarineMetagenomeDB content. A Bar plot of the distribution of sequencing technologies (Sequencing platform) per database of origin (Source database). B Bar plot of the distribution of the top ten (10) sequencing countries of origin of the metagenomic samples (Sample location). C Bar plot of the distribution of the top ten (10) sequencing water bodies (oceans/sea) where the metagenomic samples were collected. D Bar plot of the distribution of the top 10 biomes where the metagenomic samples were collected
Fig. 3
Fig. 3
Co-occurrence of the MarineMetagenomeDB attributes. A Network representation of the frequencies of biomes and water bodies (Ocean/sea). B Network representation of frequencies of marine attributes. For all network graphs, not assigned (NA) values were omitted
Fig. 4
Fig. 4
MarineMetagenomeDB user-interface overview. A The ‘Interactive Map’ allows users to select samples according to their geographical location on the map using a selection tool, B The ‘Advanced search’ tab allows users to select as many filters as they want, and the metadata is displayed under the filtering options

References

    1. Johnson J, Jain K, Madamwar D. Functional Metagenomics. Curr Dev Biotechnol Bioeng [Internet]. Elsevier; 2017 [cited 2021 Jun 21]. p. 27–43. Available from: https://linkinghub.elsevier.com/retrieve/pii/B978044463667600002X
    1. Qiang-long Z, Shi L, Peng G, Fei-shi L. High-throughput sequencing technology and its application. J Northeast Agric Univ Engl Ed. 2014;21:84–96.
    1. Kodama Y, Shumway M, Leinonen R, On behalf of the International Nucleotide Sequence Database Collaboration The sequence read archive: explosive growth of sequencing data. Nucleic Acids Res. 2012;40:D54–D56. doi: 10.1093/nar/gkr854. - DOI - PMC - PubMed
    1. Karsch-Mizrachi I, Takagi T, Cochrane G, On behalf of the International Nucleotide Sequence Database Collaboration The International nucleotide sequence database collaboration. Nucleic Acids Res. 2018;46:D48–51. doi: 10.1093/nar/gkx1097. - DOI - PMC - PubMed
    1. NCBI Resource Coordinators Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2012;41:D8–20. doi: 10.1093/nar/gks1189. - DOI - PMC - PubMed

LinkOut - more resources