Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Feb 12;160(4):583-594.
doi: 10.1016/j.cell.2014.12.038. Epub 2015 Jan 29.

Extensive strain-level copy-number variation across human gut microbiome species

Affiliations

Extensive strain-level copy-number variation across human gut microbiome species

Sharon Greenblum et al. Cell. .

Abstract

Within each bacterial species, different strains may vary in the set of genes they encode or in the copy number of these genes. Yet, taxonomic characterization of the human microbiota is often limited to the species level or to previously sequenced strains, and accordingly, the prevalence of intra-species variation, its functional role, and its relation to host health remain unclear. Here, we present a comprehensive large-scale analysis of intra-species copy-number variation in the gut microbiome, introducing a rigorous computational pipeline for detecting such variation directly from shotgun metagenomic data. We uncover a large set of variable genes in numerous species and demonstrate that this variation has significant functional and clinically relevant implications. We additionally infer intra-species compositional profiles, identifying population structure shifts and the presence of yet uncharacterized variants. Our results highlight the complex relationship between microbiome composition and functional capacity, linking metagenome-level compositional shifts to strain-level variation.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Schematic of analysis pipeline
Reads from metagenomic samples were mapped to KEGG- annotated reference genomes, grouped into species-level genome clusters. The total coverage of each KO (KEGG orthology group), k, in each genome cluster, c, in each sample, s, was normalized by cluster abundance to calculate gene copy number Vkcs. KCs (specific KOs in specific genome clusters) whose copy number varied significantly across samples were detected, as well as those whose copy number was associated with host state (obesity, IBD). See also Figure S3 and Table S3.
Figure 2
Figure 2. Genome cluster statistics
The mappability, abundance, and prevalence of each genome cluster (representing a species-level group) are shown in a 3 vertically aligned plots. Clusters are sorted by their prevalence across samples. (A) Cluster mappability, as determined by a large-scale simulation assay measuring the accuracy of mapping reads extracted from the cluster’s genomes to a database in which the genome of origin was removed. In this simulation, reads from clusters represented in the reference database by a single genome (marked with a dot above the column) are expected to remain unmapped. (B) The distribution of each cluster’s abundance (rather than that of single genomes) across samples, as determined by the average coverage of 13 single-copy marker genes. (C) Cluster prevalence (the number of samples in which the cluster was ‘detectable’) within each host group, shown as a stacked bar plot. See also Figures S1–S2 and Tables S1–S2.
Figure 3
Figure 3. A map of variable KCs
A matrix map representing the status of variable KOs (x-axis) in each genome cluster (y-axis). Colored bars represent variable KCs (highly variable KCs vary widely in copy number across all samples whereas set-specific variable KCs are increased and/or decreased in copy number in only a small subset of the samples), while light gray bars indicate KCs with consistent copy number across samples, and KOs not present in a genome cluster are left white. Genome clusters are ordered by phylogeny and KOs are ordered by hierarchical clustering. The bar chart to the right of the map represents the fraction of KOs in each cluster identified as variable. Above the map, certain groups of functionally-related KOs are highlighted. The 314 KOs uniquely variable in the E. coli cluster (the majority of which have only been annotated in E. coli-like genomes) were excluded due to space constraints. See also Figure S4 and Tables S4–S6.
Figure 4
Figure 4. Comparison of highly variable KCs to known variation among reference genomes
(A) In each Venn diagram, the gray circle represents the set of all KCs in a given genome cluster, the pink circle represents the fraction of those KCs exhibiting copy number variation across the cluster’s reference genomes, and the red circle represents the set of KCs detected as highly variable. Overlap of the pink and red circles indicates correspondence between known and detected variation. Each diagram is labeled with the cluster ID, representative species name, and number of reference genomes. (B–C) Additional variation in reference genomes that were not used as mapping targets is represented by either an orange circle (additional reference genomes from IMG) or a yellow circle (additional reference genomes from NCBI), compared to variation in included reference genomes (pink) and detected highly variable KCs (red). See also Figure S5.
Figure 5
Figure 5. Copy number of highly variable transport KCs in Bacteroides ovatus.
The size and color of each square represents the copy number of each highly variable KC within each sample. Samples are grouped by host state (I: IBD, h: healthy, o: obese). The copy number of the 13 marker KCs in this genome cluster are illustrated for comparison. See also Figure S6.
Figure 6
Figure 6. Copy number variation of host state-associated KCs
Two KCs whose copy number was significantly increased in samples from a specific host state are shown. The size and color of each square represent the copy number of the KC within each sample. (A) The copy number of thioredoxin 1 (K03671) in Clostridium sp. is significantly increased in samples from obese subjects. (B) The copy number of an MFS transporter gene (K08217) in the Roseburia inulinivorans genome cluster is significantly increased in samples from IBD subjects. See also Table S7.
Figure 7
Figure 7. Predicted strain-level population structure within Clostridium sp
(A) A linear regression analysis was used to model the copy number profile obtained for cluster 110 (Streptococcus thermophilus) in each sample as a combination of known reference genomes, with prediction weights shown as stacked colored bars. Prediction accuracy (R2) is indicated above each bar. Samples with low or negative R2 values potentially contain variation that cannot be explained by any combination of known reference genomes. (B) A principal coordinate analysis depicting the differences between the copy number profiles obtained for this genome cluster in the various samples (open circles), as well as the copy number profiles of reference genomes (filled circles). See also Figure S7.

References

    1. Borziak K, Fleetwood AD, Zhulin IB. Chemoreceptor gene loss and acquisition via horizontal gene transfer in Escherichia coli. J Bacteriol. 2013;195:3596–3602. - PMC - PubMed
    1. Brown CT, Sharon I, Thomas BC, Castelle CJ, Morowitz MJ, Banfield JF. Genome resolved analysis of a premature infant gut microbial community reveals a Varibaculum cambriense genome and a shift towards fermentation-based metabolism during the third week of life. Microbiome. 2013;1:30. - PMC - PubMed
    1. Daniel H, Moghaddas Gholami A, Berry D, Desmarchelier C, Hahne H, Loh G, Mondot S, Lepage P, Rothballer M, Walker A, et al. High-fat diet alters gut microbiota physiology in mice. ISME J. 2014;8:295–308. - PMC - PubMed
    1. Fitzsimons MS, Novotny M, Lo CC, Dichosa AEK, Yee-Greenbaum JL, Snook JP, Gu W, Chertkov O, Davenport KW, McMurry K, et al. Nearly finished genomes produced using gel microdroplet culturing reveal substantial intraspecies genomic diversity within the human microbiome. Genome Res. 2013;23:878–888. - PMC - PubMed
    1. Frank DN, St Amand AL, Feldman RA, Boedeker EC, Harpaz N, Pace NR. Molecular-phylogenetic characterization of microbial community imbalances in human inflammatory bowel diseases. Proc Natl Acad Sci U S A. 2007;104:13780–13785. - PMC - PubMed

Publication types

MeSH terms