Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May;42(5):715-718.
doi: 10.1038/s41587-023-01845-1. Epub 2023 Jul 27.

Greengenes2 unifies microbial data in a single reference tree

Affiliations

Greengenes2 unifies microbial data in a single reference tree

Daniel McDonald et al. Nat Biotechnol. 2024 May.

Erratum in

  • Author Correction: Greengenes2 unifies microbial data in a single reference tree.
    McDonald D, Jiang Y, Balaban M, Cantrell K, Zhu Q, Gonzalez A, Morton JT, Nicolaou G, Parks DH, Karst SM, Albertsen M, Hugenholtz P, DeSantis T, Song SJ, Bartko A, Havulinna AS, Jousilahti P, Cheng S, Inouye M, Niiranen T, Jain M, Salomaa V, Lahti L, Mirarab S, Knight R. McDonald D, et al. Nat Biotechnol. 2024 May;42(5):813. doi: 10.1038/s41587-023-02026-w. Nat Biotechnol. 2024. PMID: 37853258 Free PMC article. No abstract available.

Abstract

Studies using 16S rRNA and shotgun metagenomics typically yield different results, usually attributed to PCR amplification biases. We introduce Greengenes2, a reference tree that unifies genomic and 16S rRNA databases in a consistent, integrated resource. By inserting sequences into a whole-genome phylogeny, we show that 16S rRNA and shotgun metagenomic data generated from the same samples agree in principal coordinates space, taxonomy and phenotype effect size when analyzed with the same tree.

PubMed Disclaimer

Conflict of interest statement

D.M. is a consultant for BiomeSense, Inc., has equity and receives income. The terms of these arrangements have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies. R.K. is a scientific advisory board member, and consultant for BiomeSense, Inc., has equity and receives income. He is a scientific advisory board member and has equity in GenCirq. He is a consultant and scientific advisory board member for DayTwo, and receives income. He has equity in and acts as a consultant for Cybele. He is a co-founder of Biota, Inc., and has equity. He is a cofounder of Micronoma, and has equity and is a scientific advisory board member. The terms of this arrangement have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Greengenes2 overview and harmonization of 16S rRNA ASVs with shotgun metagenomic data.
a, The Greengenes2 phylogeny rendered using Empress, with ASV multifurcations collapsed; tip color indicates representation in the American Gut Project (AGP), the EMP, both or neither, with the top 20 represented phyla depicted in the outer bar. b, The same collapsed phylogeny colored by the presence or absence of the best BLAST hit from SILVA 138. The bar depicts the same coloring as the tips. c, EMP samples and the amount of novel branch length (normalized by the total backbone branch length) added to the tree through ASV fragment placement. Note that sample counts are not even across EMPO3 categories. d, Bray–Curtis applied to paired 16S V4 rRNA ASVs and whole-genome shotgun samples from THDMI subset of The Microsetta Initiative; PC, principal coordinate. e, Same data as d but computing Bray–Curtis on collapsed genus data. f, Same data as d and e but using weighted UniFrac at the ASV and genome identifier levels. Source data
Fig. 2
Fig. 2. Taxonomic and effect size consistency between 16S rRNA ASVs and shotgun metagenomic data.
ac, Per-sample taxonomy comparisons between 16S and whole-genome shotgun profiles from THDMI. The solid bar depicts the 50th percentile, and the dashed lines are 25th and 75th percentiles. a, Assessment of 16S taxonomy with SILVA 138 using the default q2-feature-classifier naive Bayes model (note, SILVA does not annotate at the species level); GG2, Greengenes2. b, Assessment of 16S taxonomy with Greengenes 13_8 (GG13_8) using the default q2-feature-classifier naive Bayes model. c, Assessment of 16S taxonomy performed by reading the lineages directly from the phylogeny or through naive Bayes trained on the V4 regions of the Greengenes2 backbone. d,e, Effect size calculations performed with Evident on paired 16S and whole-genome shotgun samples from THDMI. Calculations were performed at maximal resolution using ASVs for 16S and genome identifiers for shotgun samples. The data represented here are human gut microbiome samples. The stars denote variables that are drawn out specifically in the plot (for example, population) and were arbitrarily selected as comparison points to help highlight differences between d and e. Bray–Curtis distances (d) and weighted normalized UniFrac (e) are shown. Source data

References

    1. Zhu Q, et al. Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea. Nat. Commun. 2019;10:5477. doi: 10.1038/s41467-019-13443-4. - DOI - PMC - PubMed
    1. Parks DH, et al. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res. 2022;50:D785–D794. doi: 10.1093/nar/gkab776. - DOI - PMC - PubMed
    1. Quast C, et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013;41:D590–D596. doi: 10.1093/nar/gks1219. - DOI - PMC - PubMed
    1. McDonald D, et al. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of Bacteria and Archaea. ISME J. 2012;6:610–618. doi: 10.1038/ismej.2011.139. - DOI - PMC - PubMed
    1. Balaban, M. et al. Generation of accurate, expandable phylogenomic trees with uDANCE. Nat. Biotechnol.10.1038/s41587-023-01868-8 (2023). - PMC - PubMed

Grants and funding