Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011;6(12):e27992.
doi: 10.1371/journal.pone.0027992. Epub 2011 Dec 6.

Accurate genome relative abundance estimation based on shotgun metagenomic reads

Affiliations

Accurate genome relative abundance estimation based on shotgun metagenomic reads

Li C Xia et al. PLoS One. 2011.

Abstract

Accurate estimation of microbial community composition based on metagenomic sequencing data is fundamental for subsequent metagenomics analysis. Prevalent estimation methods are mainly based on directly summarizing alignment results or its variants; often result in biased and/or unstable estimates. We have developed a unified probabilistic framework (named GRAMMy) by explicitly modeling read assignment ambiguities, genome size biases and read distributions along the genomes. Maximum likelihood method is employed to compute Genome Relative Abundance of microbial communities using the Mixture Model theory (GRAMMy). GRAMMy has been demonstrated to give estimates that are accurate and robust across both simulated and real read benchmark datasets. We applied GRAMMy to a collection of 34 metagenomic read sets from four metagenomics projects and identified 99 frequent species (minimally 0.5% abundant in at least 50% of the data-sets) in the human gut samples. Our results show substantial improvements over previous studies, such as adjusting the over-estimated abundance for Bacteroides species for human gut samples, by providing a new reference-based strategy for metagenomic sample comparisons. GRAMMy can be used flexibly with many read assignment tools (mapping, alignment or composition-based) even with low-sensitivity mapping results from huge short-read datasets. It will be increasingly useful as an accurate and robust tool for abundance estimation with the growing size of read sets and the expanding database of reference genomes.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. The GRAMMy model.
A schematic diagram of the finite mixture model underlies the GRAMMy framework for shotgun metagenomics. In the figure, ‘iid’ stands for “independent identically distributed”.
Figure 2
Figure 2. The GRAMMy flowchart.
A typical flowchart of GRAMMy analysis pipeline employs ‘map’ and ‘k-mer’ assignment.
Figure 3
Figure 3. Frequent species for human gut metagenomes.
The 99 species occurring in at least 50% of the 33 human gut samples with a minimum relative abundance of 0.05% were selected. ‘gut_HGS_90’ indicates that the human gut (‘gut’) read sets were mapped to the reference genome set (‘HGS’) with an identity rate cut-off at 90% (‘90’).
Figure 4
Figure 4. Heatmap biclustering of human gut metagenomes.
‘gut_HGS_90’ indicates that the human gut (‘gut’) read sets were mapped to the reference genome set (‘HGS’) with an identity rate cut-off at 90% (‘90’). The bottom labels indicate human gut samples. The top right legend shows the color-coding for columns indicating the sample age category and dataset origin. The bottom right legend shows color-coding for rows indicating the top 4 most abundant phyla in human gut. The relative abundance for each sample is normalized by a rank transformation.
Figure 5
Figure 5. GRAMMy estimates of GRAs for the acid mine drainage data.
Estimated relative abundance for each strain is shown as a percentage. The first two strains dominate the sample.
Figure 6
Figure 6. Running time comparison.
GRAMMy is the fastest in all cases as compared to MEGAN and GAAS in processing time. The BLAT mapping time is excluded for all compared tools.

References

    1. Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, et al. The human microbiome project. Nature. 2007;449:804–810. - PMC - PubMed
    1. Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004;304:66–74. - PubMed
    1. Tyson GW, Chapman J, Hugenholtz P, Allen EE, Ram RJ, et al. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature. 2004;428:37–43. - PubMed
    1. Morgan JL, Darling AE, Eisen JA. Metagenomic sequencing of an in vitro-simulated microbial community. PLoS ONE. 2010;5:e10209. - PMC - PubMed
    1. Diaz NN, Krause L, Goesmann A, Niehaus K, Nattkemper TW. TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics. 2009;10:56. - PMC - PubMed

Publication types