Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 8;10(1):104.
doi: 10.1186/s40168-022-01295-y.

Phylogenies of the 16S rRNA gene and its hypervariable regions lack concordance with core genome phylogenies

Affiliations

Phylogenies of the 16S rRNA gene and its hypervariable regions lack concordance with core genome phylogenies

Hayley B Hassler et al. Microbiome. .

Abstract

Background: The 16S rRNA gene is used extensively in bacterial phylogenetics, in species delineation, and now widely in microbiome studies. However, the gene suffers from intragenomic heterogeneity, and reports of recombination and an unreliable phylogenetic signal are accumulating. Here, we compare core gene phylogenies to phylogenies constructed using core gene concatenations to estimate the strength of signal for the 16S rRNA gene, its hypervariable regions, and all core genes at the intra- and inter-genus levels. Specifically, we perform four intra-genus analyses (Clostridium, n = 65; Legionella, n = 47; Staphylococcus, n = 36; and Campylobacter, n = 17) and one inter-genus analysis [41 core genera of the human gut microbiome (31 families, 17 orders, and 12 classes), n = 82].

Results: At both taxonomic levels, the 16S rRNA gene was recombinant and subject to horizontal gene transfer. At the intra-genus level, the gene showed one of the lowest levels of concordance with the core genome phylogeny (50.7% average). Concordance for hypervariable regions was lower still, with entropy masking providing little to no benefit. A major factor influencing concordance was SNP count, which showed a positive logarithmic association. Using this relationship, we determined that 690 ± 110 SNPs were required for 80% concordance (average 16S rRNA gene SNP count was 254). We also found a wide range in 16S-23S-5S rRNA operon copy number among genomes (1-27). At the inter-genus level, concordance for the whole 16S rRNA gene was markedly higher (73.8% - 10th out of 49 loci); however, the most concordant hypervariable regions (V4, V3-V4, and V1-V2) ranked in the third quartile (62.5 to 60.0%).

Conclusions: Ramifications of a poor phylogenetic performance for the 16S rRNA gene are far reaching. For example, in addition to incorrect species/strain delineation and phylogenetic inference, it has the potential to confound community diversity metrics if phylogenetic information is incorporated - for example, with popular approaches such as Faith's phylogenetic diversity and UniFrac. Our results highlight the problematic nature of these approaches and their use (along with entropy masking) is discouraged. Lastly, the wide range in 16S rRNA gene copy number among genomes also has a strong potential to confound diversity metrics. Video Abstract.

Keywords: 16S rRNA gene; Comparative phylogenomics; Diversity metrics; Entropy masking; Horizontal gene transfer; Microbiome; Recombination; Ribosome; Species phylogeny.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Levels of concordance with the species phylogeny for core genes sorted from lowest (left) to highest (right) for the 16S rRNA gene hypervariable regions (16S HVR, purple), rRNA genes (rRNA, pink), coding ribosomal genes (CR, light blue), non-ribosomal genes (NR, dark blue), and rpo genes (rpo, green). Colored dots correspond to levels of concordance with the species phylogeny for genes (and hypervariable regions). Degree of shading represents the number of loci observed for each level of concordance. For 16S HVR, SC, and NR, a single dot for each genus contains a number that shows the maximum number of loci observed across all levels of concordance for that genus. This dot will have the darkest level of shading. In all cases, there are multiple dots with this maximum level of shading and maximum loci count. For rRNA and rpo, there are only three loci (5S, 16S, 23S and rpoA, rpoB, rpoC) each with a frequency of one and the same degree of shading. The following designations indicate average concordance for each genus: Staphylococcus (S), Legionella (L), Clostridium (Cl), and Campylobacter (Ca). Hypervariable regions frequently used in microbiome research (V3-V4) are highlighted. Individual rRNA and rpo genes are designated as follows: 16S rRNA gene (16S), 23S rRNA gene (23S), 5S rRNA gene (5S), rpoA gene (A), rpoB gene (B), and rpoC gene (C)
Fig. 2
Fig. 2
Levels of concordance with the species phylogeny for core genes plotted against each gene’s SNP count. For each genus, the logarithmic model is shown. Gene and gene category labels and coloring follow Fig. 1
Fig. 3
Fig. 3
Dot plot showing the number of SNPs required for 80% concordance with the species phylogeny for seven SNP categories (see text). Non-ribosomal SNPs (NR, dark blue), coding ribosomal SNPs (CR, light blue), core gene SNPs (Core, grey), 3rd and 1st/2.nd nucleotide positions from non-ribosomal genes (NR 3, NR 1–2; dark blue) and coding ribosomal genes (CR 3, CR 1–2; light blue). Genus labels follow Fig. 1. The average number of SNPs necessary for 80% concordance for each SNP category is indicated by larger dots
Fig. 4
Fig. 4
For each genus and SNP category, a dot plot showing levels of concordance predicted using the best fit logarithmic equation where the y-value was the average SNP count for rRNA alignments (266 nt) (see text for rationale). Large dots show average concordance for each genus. Non-ribosomal SNPs (NR) are shown in dark blue, coding ribosomal SNPs (CR) in light blue, core SNPs in grey, and rRNA SNPs in pink. Genus labels follow Fig. 1
Fig. 5
Fig. 5
A ML phylogeny showing relationship among 82 species that represent 41 core genera of the human gut microbiome. Taxonomic nomenclature and classification follow NCBI, and for each phylum, updated names are shown with longstanding informal names shown in parentheses. Levels of bootstrap support lower than 90% are shown (500 replicates). For each genus, two representative species are included (names not shown — see Table S11 and Fig. S8A for details). B Dot plot showing levels of concordance for core genes and 16S hypervariable regions. Black dots show concordance with the species phylogeny and grey dots show concordance with the phylogeny representing a consensus of the topologies of each single-copy core gene phylogeny. The 440-bp section of the rpoB gene referred to in the text is shown with an asterisk. Gene and gene category labels and coloring follow Fig. 1

Similar articles

Cited by

References

    1. Woese CR, Fox GE. Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci U S A. 1977;74(11):5088–90. doi: 10.1073/pnas.74.11.5088. - DOI - PMC - PubMed
    1. Woese C. Bacterial evolution. Microbiol Rev. 1987;51(2):221–71. doi: 10.1128/mr.51.2.221-271.1987. - DOI - PMC - PubMed
    1. Van de Peer Y. A quantitative map of nucleotide substitution rates in bacterial rRNA. Nucleic Acids Res. 1996;24(17):3381–91. doi: 10.1093/nar/24.17.3381. - DOI - PMC - PubMed
    1. Rintala A, Pietilä S, Munukka E, Eerola E, Pursiheimo JP, Laiho A, et al. Gut microbiota analysis results are highly dependent on the 16s rRNA gene target region, whereas the impact of DNA extraction is minor. J Biomol Tech. 2017;28(1):19–30. doi: 10.7171/jbt.17-2801-003. - DOI - PMC - PubMed
    1. Johnson JS, Spakowicz DJ, Hong BY, Petersen LM, Demkowicz P, Chen L, et al. Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis. Nat Commun. 2019;10(1):5029. doi: 10.1038/s41467-019-13036-1. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources