Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan 8;48(D1):D626-D632.
doi: 10.1093/nar/gkz994.

TerrestrialMetagenomeDB: a public repository of curated and standardized metadata for terrestrial metagenomes

Affiliations

TerrestrialMetagenomeDB: a public repository of curated and standardized metadata for terrestrial metagenomes

Felipe Borim Corrêa et al. Nucleic Acids Res. .

Abstract

Microbiome studies focused on the genetic potential of microbial communities (metagenomics) became standard within microbial ecology. MG-RAST and the Sequence Read Archive (SRA), the two main metagenome repositories, contain over 202 858 public available metagenomes and this number has increased exponentially. However, mining databases can be challenging due to misannotated, misleading and decentralized data. The main goal of TerrestrialMetagenomeDB is to make it easier for scientists to find terrestrial metagenomes of interest that could be compared with novel datasets in meta-analyses. We defined terrestrial metagenomes as those that do not belong to marine environments. Further, we curated the database using text mining to assign potential descriptive keywords that better contextualize environmental aspects of terrestrial metagenomes, such as biomes and materials. TerrestrialMetagenomeDB release 1.0 includes 15 022 terrestrial metagenomes from SRA and MG-RAST. Together, the downloadable data amounts to 68 Tbp. In total, 199 terrestrial terms were divided into 14 categories. These metagenomes span 83 countries, 30 biomes and 7 main source materials. The TerrestrialMetagenomeDB is publicly available at https://webapp.ufz.de/tmdb.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of the TerrestrialMetagenomeDB (TMDB) construction and availability. The construction of TMDB comprises: (A) Metadata retrieval for metagenomes present in SRA and MG-RAST; (B) Standardization of attributes; (C) Identification of terrestrial metagenomes; and (D) merging of SRA and MG-RAST metadata. (E) The TMDB was made available through a user-friendly Shiny web application.
Figure 2.
Figure 2.
Descriptive statistics of the TerrestrialMetagenomeDB content. (A) Network representation of the frequencies of ‘biome’-related terms in the database (polygon shape). The frequencies of pairs of ‘biome’ terms found in the database are represented by colored arrows. (B) Network representation of the frequencies of ‘material’-related terms in the database (ellipse shape). The frequencies of pairs of ‘material’ terms found in the database are represented by colored arrows. (C) Bar plot of the distribution of sequencing technologies (Sequencing platform) per database of origin (Source database). (D) Bar plot showing the distribution of the country of origin of the metagenomic samples (Sample location). The not assigned values (NA’s) were omitted in this plot.
Figure 3.
Figure 3.
Overview of the TerrestrialMetagenomeDB user-interface. (A) Metagenomes can be selected in the ‘Interactive Map’ using a selection tool. (B) Metadata related to the selected entries is shown in the data table and can be further filtered and exported. For illustrative purposes, only the set of ‘Quick filters’ is depicted.

References

    1. Marchesi J.R., Ravel J.. The vocabulary of microbiome research: a proposal. Microbiome. 2015; 3:31. - PMC - PubMed
    1. Kodama Y., Shumway M., Leinonen R.. The sequence read archive: explosive growth of sequencing data. Nucleic Acids Res. 2012; 40:D54–D56. - PMC - PubMed
    1. Karsch-Mizrachi I., Takagi T., Cochrane G.. on behalf of the International Nucleotide Sequence Database Collaboration The international nucleotide sequence database collaboration. Nucleic Acids Res. 2017; 46:D48–D51. - PMC - PubMed
    1. Harrison P.W., Alako B., Amid C., Cerdeño-Tárraga A., Cleland I., Holt S., Hussein A., Jayathilaka S., Kay S., Keane T. et al. .. The European Nucleotide Archive in 2018. Nucleic Acids Res. 2019; 47:D84–D88. - PMC - PubMed
    1. Mashima J., Kodama Y., Kosuge T., Fujisawa T., Katayama T., Nagasaki H., Okuda Y., Kaminuma E., Ogasawara O., Okubo K. et al. .. DNA data bank of Japan (DDBJ) progress report. Nucleic Acids Res. 2016; 44:D51–D57. - PMC - PubMed

Publication types