Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 7;18(1):37.
doi: 10.1186/s12915-020-0756-z.

Improving the usability and comprehensiveness of microbial databases

Affiliations

Improving the usability and comprehensiveness of microbial databases

Caitlin Loeffler et al. BMC Biol. .

Erratum in

Abstract

Metagenomics studies leverage genomic reference databases to generate discoveries in basic science and translational research. However, current microbial studies use disparate reference databases that lack consistent standards of specimen inclusion, data preparation, taxon labelling and accessibility, hindering their quality and comprehensiveness, and calling for the establishment of recommendations for reference genome database assembly. Here, we analyze existing fungal and bacterial databases and discuss guidelines for the development of a master reference database that promises to improve the quality and quantity of omics research.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1.
Fig. 1.
Consensus of fungal and bacterial genome representation across multiple reference databases. a In total, there are 1405 unique species represented across the four fungal databases. Of these, 48 species are represented in all four databases. There are a total of 175 species found where strictly three databases overlap and 189 species where strictly two databases overlap. A total of 993 unique fungal species cannot be found in any overlaps. b In total, there are 42,337 unique species represented across the three bacterial databases. Of these, 6543 species are represented in all three databases, and 17,506 total species are found where strictly two databases overlap. A total of 18,288 unique bacterial species cannot be found in any overlaps. c In total, there are 786 unique genera represented across the four fungal databases. Of these, 29 genera are represented in all four databases. There are a total of 109 genera found where strictly three databases overlap and 142 genera where strictly two databases overlap. A total of 506 unique fungal genera cannot be found in any overlaps. d In total, there are 2214 unique genera represented across the three bacterial databases. Of these, 76 genera are represented in all three databases, and 1149 total genera are found where strictly two databases overlap. A total of 989 unique bacterial genera cannot be found in any overlaps
Fig. 2.
Fig. 2.
Fungal and bacterial genome composition across multiple reference databases. a Percentage of references per fungal database available as complete genomes (yellow), fragmented genomes (i.e., set of contigs) (blue), and a mixture of full chromosomes and contigs (red). b Percentage of species per bacterial database available as complete genomes (yellow), fragmented genomes (i.e., set of contigs) (blue), and a mixture of full chromosomes and contigs (red). c Length distribution of the fungal genomes (contigs) across the databases. The contig mean lengths for each fungal database are 322 thousand bp (Ensembl), 513 thousand bp (RefSeq), 426 thousand bp (JGI 1 K), and 548 thousand bp (FungiDB). d Length distribution of the bacterial genomes (contigs) across the databases. The contig mean lengths for each bacterial database are 149 thousand bp (Ensembl), 119 thousand bp (RefSeq), and 107 thousand bp (PATRIC)

References

    1. Sharpton TJ. An introduction to the analysis of shotgun metagenomic data. Front Plant Sci. 2014;5:209. doi: 10.3389/fpls.2014.00209. - DOI - PMC - PubMed
    1. Hilton SK, Castro-Nallar E, Pérez-Losada M, Toma I, McCaffrey TA, Hoffman EP, Siegel MO, Simon GL, Johnson WE, Crandall KA. Metataxonomic and metagenomic approaches vs culture-based techniques for clinical pathology. Front Microbiol. 2016;7:484. doi: 10.3389/fmicb.2016.00484. - DOI - PMC - PubMed
    1. Kersey PJ, Allen JE, Allot A, Barba M, Boddu S, Bolt BJ, Carvalho-Silva D, Christensen M, Davis P, Grabmueller C, Kumar N. Ensembl genomes 2018: an integrated omics infrastructure for non-vertebrate species. Nucleic Acids Res. 2018;46(D1):D802–D808. doi: 10.1093/nar/gkx1011. - DOI - PMC - PubMed
    1. O'Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput B, Robbertse B, Smith-White B, Ako-Adjei D, Astashyn A. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016;44(D1):D733–D745. doi: 10.1093/nar/gkv1189. - DOI - PMC - PubMed
    1. Nordberg H, Cantor M, Dusheyko S, Hua S, Poliakov A, Shabalov I, Smirnova T, Grigoriev IV, Dubchak I. The genome portal of the Department of Energy Joint Genome Institute: 2014 updates. Nucleic Acids Res. 2014;42(D1):D26–D31. doi: 10.1093/nar/gkt1069. - DOI - PMC - PubMed

MeSH terms