Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Sep;18(9):491-506.
doi: 10.1038/s41579-020-0368-1. Epub 2020 Jun 4.

Diversity within species: interpreting strains in microbiomes

Affiliations
Review

Diversity within species: interpreting strains in microbiomes

Thea Van Rossum et al. Nat Rev Microbiol. 2020 Sep.

Abstract

Studying within-species variation has traditionally been limited to culturable bacterial isolates and low-resolution microbial community fingerprinting. Metagenomic sequencing and technical advances have enabled culture-free, high-resolution strain and subspecies analyses at high throughput and in complex environments. This holds great scientific promise but has also led to an overwhelming number of methods and terms to describe infraspecific variation. This Review aims to clarify these advances by focusing on the diversity within bacterial and archaeal species in the context of microbiomics. We cover foundational microevolutionary concepts relevant to population genetics and summarize how within-species variation can be studied and stratified directly within microbial communities with a focus on metagenomics. Finally, we describe how common applications of within-species variation can be achieved using metagenomic data. We aim to guide the selection of appropriate terms and analytical approaches to facilitate researchers in benefiting from the increasing availability of large, high-resolution microbiome genetic sequencing data.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Drivers of variability within bacterial species. Within-species variability is introduced by mutations, which usually increase the amount of variations within a species (up arrow), and gene flow mechanisms, which can increase or decrease the amount of variation within a species. This variability is shaped by genetic drift and selective pressure, which can also increase or decrease the amount of variation. Selective pressures are shaped by many biotic and abiotic factors, some of which are known to drive adaptation in particular habitats more than others.
Figure 2
Figure 2
Within-species stratifications (a) Different operational definitions of ’strain’, based on the field of investigation: a cultured isolate in classic microbiology, a leaf node in a phylogenetic tree, and a metagenomic assembled genome (MAG) in metagenomics(b) Each point is a pairwise-comparison of one isolate genome versus all other conspecific isolate genomes. The data is from 155 bacterial species, each with at least 10 sequenced isolate genomes. Opacity of red-coloured topographical overlay indicates density of points. The plot shows the relationship between the similarity of the core genome, measured by average nucleotide identity (ANI), versus the similarity of gene content, measured by Jaccard Index. Genomes with higher similarity between their core gene sequences tend to have more genes in common (Spearman correlation R=0.57, p < 2.2e-16). However, high ANI does not necessarily imply highly similar gene content, with many genomes with over 99% core genome ANI having less than 70% of genes in common. Most within-species ANI values are greater than 97%, the few data points below 95% ANI are not shown (83% and 4% of data points, respectively). The data are adapted from Ref".(c) Spatial distribution of key terminology used to stratify variation within bacterial species, ranging from a single nucleotide variation in the whole genome to the species-level threshold (97% ANI). The coloured portions of the bars reflect the recommended scope of use for each term, and the grey portions indicate the common, often unspecific, scope of use. Broadly speaking, conspecific genomes have identical nucleotides at homologous positions across 97% of their genome (97% ANI), which corresponds to differing on the order of 116,000 SNVs based on an average bacterial genome (3.87Mb). The bottom panel illustrates the hierarchy of these terms, with a species potentially containing multiple subspecies, a subspecies containing multiple strains, and a strain containing multiple (non-identical) genomes. These genomes can be sequenced from cultured isolates or through assembly of a metagenomic sample, creating a MAG which represents the consensus genome of a population of cells.
Figure 3
Figure 3
Applications of within-species variation. Five major areas of investigation for within-species-oriented metagenomic data analysis are illustrated, paired with corresponding appropriate terminology. Trees depict the genetic similarity and ancestry of potentially coexisting populations, with nodes representing populations and edges representing genetic differences accumulating from top to bottom. (a) Source tracking is concerned with identifying an unbranched path through a tree of ancestors and descendants (a ‘lineage’, pink edges and nodes). (b) Phylogeny reconstruction aims to build a tree which reflects the history of within-species variants based on their genetic similarity. A phylogeny might be cut into complete sub-trees (‘clades’) which may be called ‘phylotypes’. (c) Metagenomic typing detects the presence of a previously identified signature of interest within a species. For example, the presence of a gene associated with pathogenicity could be the criteria for detecting a ‘pathotype’. This gene may have been transferred between clades via horizontal gene transfer (HGT), so may be at odds with the within-species phylogeny. (d, e) The genetic population structure of a species can be described from the distribution of the genetic similarities across observed variants. (d) A ‘clustered’ structure occurs when there is a discontinuity across genetic similarities, enabling clades to be grouped into distinct clusters. Such a non-uniform structure is created by unobserved (extinct or unsampled) intermediate populations. A hypothetical within-species history with unobserved populations (white nodes) can be simplified (=), showing how unobserved populations can lead to a clustered genetic distribution, which may include distinct population subspecies. As SNVs (black dots) accumulate through this history, some might be specific to a particular set of populations (coloured dots). (e) When unobserved intermediate populations are rare or when they are spread widely through a species, the genetic distribution appears uniform or smeared and distinct groups of populations are not seen. (f) Ecological niche inference combines population observational data with phenotypic and/or habitat data to identify populations that have adapted to particular niches (‘ecotypes’). Adaptive traits might be identified by comparing populations but potential geographic confounds must also be considered.

References

    1. Moore WEC, et al. Report of the Ad Hoc Committee on Reconciliation of Approaches to Bacterial Systematics. Int J Syst Evol Microbiol. 1987;37:463–464.
    1. Leimbach A, Hacker J, Dobrindt U. E. coli as an all-rounder: The thin line between commensalism and pathogenicity. Curr Top Microbiol Immunol. 2013;358:3–32. - PubMed
    1. Pierce JV, Bernstein HD. Genomic Diversity of Enterotoxigenic Strains of Bacteroides fragilis. PLoS One. 2016;11:e0158171. - PMC - PubMed
    1. Maier L, et al. Extensive impact of non-antibiotic drugs on human gut bacteria. Nature. 2018;555:623–628. - PMC - PubMed
    1. Neuenschwander SM, Ghai R, Pernthaler J, Salcher MM. Microdiversification in genome-streamlined ubiquitous freshwater Actinobacteria. ISME J. 2018;12:185–198. - PMC - PubMed

Publication types