Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar 25;16(1):51.
doi: 10.1186/s13059-015-0611-7.

Average genome size estimation improves comparative metagenomics and sheds light on the functional ecology of the human microbiome

Average genome size estimation improves comparative metagenomics and sheds light on the functional ecology of the human microbiome

Stephen Nayfach et al. Genome Biol. .

Abstract

Average genome size is an important, yet often overlooked, property of microbial communities. We developed MicrobeCensus to rapidly and accurately estimate average genome size from shotgun metagenomic data and applied our tool to 1,352 human microbiome samples. We found that average genome size differs significantly within and between body sites and tracks with major functional and taxonomic differences. In the gut, average genome size is positively correlated with the abundance of Bacteroides and genes related to carbohydrate metabolism. Importantly, we found that average genome size variation can bias comparative analyses, and that normalization improves detection of differentially abundant genes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flowchart for estimating AGS from a shotgun metagenome. 1) MicrobeCensus takes the first n reads of at least i base pairs from the shotgun metagenome and trims these reads down to i base pairs. 2) These reads are aligned against the database of essential genes using RAPsearch2. 3) A read is mapped to an essential gene family, j, if its top scoring alignment satisfies the mapping parameters, which are optimized for gene j and read length i. 4) Based on these mapped reads, the relative abundance of each essential gene family, R j, is computed. 5) Next, we use R j to obtain an estimate of AGS for each gene. 6) Outlier predictions are removed and 7) MicrobeCensus takes a weighted average over the remaining estimates to produce a robust estimate of AGS for the shotgun metagenome. QC, quality control.
Figure 2
Figure 2
Comparison of MicrobeCensus to existing methods. (A,B) Performance of MicrobeCensus was compared with that of existing methods using 20 simulated metagenomes. Unsigned error is defined as: |AĜS - AGS|/AGS. (A) MicrobeCensus versus GAAS at different levels of taxonomic exclusion. To simulate the presence of novel taxa, we held back reference sequences belonging to organisms from the same taxonomic group as organisms in the metagenome, which is indicated on the x-axis. 'None' indicates that no reference sequences were held back. Metagenomes were composed of 100-bp reads. (B) Estimation error for MicrobeCensus versus the method described by Raes et al. [9] for metagenomes of various read length. 'NA' indicates that AGS could not be estimated. (C) Speed (reads/second) of MicrobeCensus compared with existing methods on a simulated 150-bp library.
Figure 3
Figure 3
The effect of sequencing error on estimation accuracy. Unsigned error is defined as: |AĜS - AGS|/AGS. (A) MicrobeCensus was used to estimate the AGS of 20 metagenomes that were simulated with up to a 5% sequencing error rate. Metagenomes were composed of 100-bp reads from prokaryotes. (B) MicrobeCensus was used to estimate the AGS of 10 metagenomes that were composed of real Illumina reads pooled from 10 randomly chosen isolate sequencing projects.
Figure 4
Figure 4
Estimation accuracy in the presence of microbial eukaryotes and viruses. (A) MicrobeCensus was used to estimate the AGS of 20 simulated 100-bp metagenomes that contained up to 50% of Fungi, representing up to 94% of total reads. Note that axes are plotted on a log scale. (B) MicrobeCensus was used to estimate the AGS of 20 simulated 100-bp metagenomes that contained up to 50% of reads from viruses. Signed error is defined as: (AĜS - AGS)/AGS. (C) AGS estimates from (B) were used to estimate the total coverage of microbial genomes present in the simulated metagenomes. Estimated coverage of microbial genomes was obtained by dividing the number of total base pairs in a metagenome by the estimated AGS for that metagenome.
Figure 5
Figure 5
Average genome size varies systematically in human microbiome data. (A) Distribution of estimated AGS for 736 samples from the Human Microbiome Project. (B) Distribution of estimated AGS for 725 stool samples obtained from subjects originating from five different countries. (C) Estimated relative abundance of Bacteroides for the same samples shown in (B). (D) Country-specific correlations between AGS and Bacteroides relative abundance for stool samples.
Figure 6
Figure 6
MicrobeCensus enables accurate quantification of gene family abundance. Each plot shows the median abundance of essential single-copy genes across 84 stool samples from the HMP. (A) Gene family abundance was computed using relative abundance, which is scaled so the abundance of all genes sums to 1.0 for each sample. (B) Gene family abundance was computed using RPKG (reads per kilobase per genome equivalent). RPKG leverages estimates of AGS made by MicrobeCensus to normalize gene family abundance values.
Figure 7
Figure 7
Average genome size reflects diverse modes of functional adaptation in the gut microbiome. (A) Barplot of AGS for stool metagenomes. (B) Log2 fold change of KOs across stool metagenomes. KOs were grouped according to the BRITE functional hierarchy. Only KOs that were significantly correlated with AGS (q < 1e-3) are displayed. (C) Log2 fold change of essential single-copy KOs across stool metagenomes.

Similar articles

Cited by

References

    1. Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464:59–65. doi: 10.1038/nature08821. - DOI - PMC - PubMed
    1. The Human Microbiome Project Consortium Structure, function and diversity of the healthy human microbiome. Nature. 2012;486:207–14. doi: 10.1038/nature11234. - DOI - PMC - PubMed
    1. Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, Remington K, et al. The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biol. 2007;5:e16. doi: 10.1371/journal.pbio.0050016. - DOI - PMC - PubMed
    1. Fierer N, Leff JW, Adams BJ, Nielsen UN, Bates ST, Lauber CL, et al. Cross-biome metagenomic analyses of soil microbial communities and their functional attributes. Proc Natl Acad Sci U S A. 2012;109:21390–5. doi: 10.1073/pnas.1215210110. - DOI - PMC - PubMed
    1. Qin J, Li Y, Cai Z, Li S, Zhu J, Zhang F, et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature. 2012;490:55–60. doi: 10.1038/nature11450. - DOI - PubMed

Publication types