Accurate genome relative abundance estimation based on shotgun metagenomic reads

Li C Xia¹, Jacob A Cram, Ting Chen, Jed A Fuhrman, Fengzhu Sun

Affiliations

Affiliation

¹ Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America.

PMID: 22162995
PMCID: PMC3232206
DOI: 10.1371/journal.pone.0027992

Accurate genome relative abundance estimation based on shotgun metagenomic reads

Li C Xia et al. PLoS One. 2011.

. 2011;6(12):e27992.

doi: 10.1371/journal.pone.0027992. Epub 2011 Dec 6.

Authors

Li C Xia¹, Jacob A Cram, Ting Chen, Jed A Fuhrman, Fengzhu Sun

Affiliation

¹ Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America.

PMID: 22162995
PMCID: PMC3232206
DOI: 10.1371/journal.pone.0027992

Abstract

Accurate estimation of microbial community composition based on metagenomic sequencing data is fundamental for subsequent metagenomics analysis. Prevalent estimation methods are mainly based on directly summarizing alignment results or its variants; often result in biased and/or unstable estimates. We have developed a unified probabilistic framework (named GRAMMy) by explicitly modeling read assignment ambiguities, genome size biases and read distributions along the genomes. Maximum likelihood method is employed to compute Genome Relative Abundance of microbial communities using the Mixture Model theory (GRAMMy). GRAMMy has been demonstrated to give estimates that are accurate and robust across both simulated and real read benchmark datasets. We applied GRAMMy to a collection of 34 metagenomic read sets from four metagenomics projects and identified 99 frequent species (minimally 0.5% abundant in at least 50% of the data-sets) in the human gut samples. Our results show substantial improvements over previous studies, such as adjusting the over-estimated abundance for Bacteroides species for human gut samples, by providing a new reference-based strategy for metagenomic sample comparisons. GRAMMy can be used flexibly with many read assignment tools (mapping, alignment or composition-based) even with low-sensitivity mapping results from huge short-read datasets. It will be increasingly useful as an accurate and robust tool for abundance estimation with the growing size of read sets and the expanding database of reference genomes.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. The GRAMMy model.**
A schematic diagram of the finite mixture model underlies the GRAMMy framework for shotgun metagenomics. In the figure, ‘iid’ stands for “independent identically distributed”.

**Figure 2. The GRAMMy flowchart.**
A typical flowchart of GRAMMy analysis pipeline employs ‘map’ and ‘k-mer’ assignment.

**Figure 3. Frequent species for human gut metagenomes.**
The 99 species occurring in at least 50% of the 33 human gut samples with a minimum relative abundance of 0.05% were selected. ‘gut_HGS_90’ indicates that the human gut (‘gut’) read sets were mapped to the reference genome set (‘HGS’) with an identity rate cut-off at 90% (‘90’).

**Figure 4. Heatmap biclustering of human gut metagenomes.**
‘gut_HGS_90’ indicates that the human gut (‘gut’) read sets were mapped to the reference genome set (‘HGS’) with an identity rate cut-off at 90% (‘90’). The bottom labels indicate human gut samples. The top right legend shows the color-coding for columns indicating the sample age category and dataset origin. The bottom right legend shows color-coding for rows indicating the top 4 most abundant phyla in human gut. The relative abundance for each sample is normalized by a rank transformation.

**Figure 5. GRAMMy estimates of GRAs for the acid mine drainage data.**
Estimated relative abundance for each strain is shown as a percentage. The first two strains dominate the sample.

**Figure 6. Running time comparison.**
GRAMMy is the fastest in all cases as compared to MEGAN and GAAS in processing time. The BLAT mapping time is excluded for all compared tools.

See this image and copyright information in PMC

References

1. Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, et al. The human microbiome project. Nature. 2007;449:804–810. - PMC - PubMed
1. Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004;304:66–74. - PubMed
1. Tyson GW, Chapman J, Hugenholtz P, Allen EE, Ram RJ, et al. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature. 2004;428:37–43. - PubMed
1. Morgan JL, Darling AE, Eisen JA. Metagenomic sequencing of an in vitro-simulated microbial community. PLoS ONE. 2010;5:e10209. - PMC - PubMed
1. Diaz NN, Krause L, Goesmann A, Niehaus K, Nattkemper TW. TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics. 2009;10:56. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Accurate genome relative abundance estimation based on shotgun metagenomic reads

Affiliation

Accurate genome relative abundance estimation based on shotgun metagenomic reads

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources