Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;8(6):e1002358.
doi: 10.1371/journal.pcbi.1002358. Epub 2012 Jun 13.

Metabolic reconstruction for metagenomic data and its application to the human microbiome

Affiliations

Metabolic reconstruction for metagenomic data and its application to the human microbiome

Sahar Abubucker et al. PLoS Comput Biol. 2012.

Abstract

Microbial communities carry out the majority of the biochemical activity on the planet, and they play integral roles in processes including metabolism and immune homeostasis in the human microbiome. Shotgun sequencing of such communities' metagenomes provides information complementary to organismal abundances from taxonomic markers, but the resulting data typically comprise short reads from hundreds of different organisms and are at best challenging to assemble comparably to single-organism genomes. Here, we describe an alternative approach to infer the functional and metabolic potential of a microbial community metagenome. We determined the gene families and pathways present or absent within a community, as well as their relative abundances, directly from short sequence reads. We validated this methodology using a collection of synthetic metagenomes, recovering the presence and abundance both of large pathways and of small functional modules with high accuracy. We subsequently applied this method, HUMAnN, to the microbial communities of 649 metagenomes drawn from seven primary body sites on 102 individuals as part of the Human Microbiome Project (HMP). This provided a means to compare functional diversity and organismal ecology in the human microbiome, and we determined a core of 24 ubiquitously present modules. Core pathways were often implemented by different enzyme families within different body sites, and 168 functional modules and 196 metabolic pathways varied in metagenomic abundance specifically to one or more niches within the microbiome. These included glycosaminoglycan degradation in the gut, as well as phosphate and amino acid transport linked to host phenotype (vaginal pH) in the posterior fornix. An implementation of our methodology is available at http://huttenhower.sph.harvard.edu/humann. This provides a means to accurately and efficiently characterize microbial metabolic pathways and functional modules directly from high-throughput sequencing reads, enabling the determination of community roles in the HMP cohort and in future metagenomic studies.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Overview of the HUMAnN method for metabolic and functional reconstruction from metagenomic data.
The HMP Unified Metabolic Analysis Network (HUMAnN) software recovers the presence, absence, and abundance of microbial gene families and pathways from metagenomic data. Cleaned short DNA reads are aligned to the KEGG Orthology (or any other characterized sequence database) using accelerated translated BLAST. Gene family abundances are calculated as weighted sums of the alignments from each read, normalized by gene length and alignment quality. Pathway reconstruction is performed using a maximum parsimony approach followed by taxonomic limitation (to remove false positive pathway identifications) and gap filling (to account for rare genes in abundant pathways). The resulting output is a set of matrices of pathway coverages (presence/absence) and abundances, as analyzed here for the seven primary body sites of the Human Microbiome Project.
Figure 2
Figure 2. Accuracy of inferred module abundances and coverages using four synthetic metagenomes.
An evaluation of HUMAnN's performance on a high-complexity mock community with a randomized log-normal distribution of 100 organisms as compared to an approach using the single best BLAST hit for each gene family and direct assignment to metabolic modules. Both A) correlation of inferred abundances (arcsine square root transformed for variance stabilization) and B) partial AUC at 0.1 false positive rate are high, outperforming single best BLAST hit functional reconstruction of microbial communities.
Figure 3
Figure 3. Metabolic modules differentially present or abundant in at least one body habitat of the human microbiome.
Metabolic modules and pathways from the KEGG BRITE hierarchy found to be differentially abundant (inner cladogram) or differentially covered (outer ring, presence/absence) in the human microbiome. The former were determined using LEfSe and the latter by presence in at least 90% of samples with ≥0.9 coverage or absence in at least 90% with ≤0.1 coverage. Differentially abundant modules are colored by their most abundant body habitat. 168 significantly enriched module abundances were detected, in contrast to only 24 differentially covered.
Figure 4
Figure 4. Patterns of abundance of functional modules in 649 metagenomic samples covering seven body habitats.
A heatmap of the first five principal components (>95% variance) of module abundances averaged and normalized over each of the seven body sites. Cell color indicates positive (yellow) or negative (blue) variation, with the adjacent Scree plot showing the total variance in each component. The three most positively and negatively covarying modules contributing to each component are shown. Briefly, the first principal component differentiates the skin and gastrointestinal tract, the second differentiates the vaginal habitat, the third the gut, the fourth the supragingival plaque versus other oral sites, and the fifth the nares versus skin.
Figure 5
Figure 5. Gene- and module-specific reconstruction of glycosaminoglycan degradation specific to the gut microbiota.
A) Individual gene family abundances for four gut-specific high abundance modules: chondroitin, dermatan, and keratan sulfate degradation (glycosaminoglycan degradation, also including heparan sulfate), and uronic acid metabolism (occurring directly downstream in the pentose and glucuronate interconversion pathway). Relative abundance is shown from dark (high) to light (low) green, averaged over 136 stool microbiomes, with enzymes not present in the KEGG Orthology in gray. Heparan degradation is absent specifically due to the lack of heparanase (K07964-5), but no one gene family is otherwise responsible for the high abundances of the remaining four modules in the gut, despite several shared enzymes (e.g. beta-glucuronidase, K01195). B) Relative abundances of all five modules in all body habitats and samples, demonstrating gut-specific prevalence. Despite the close connections among these pathways, they show distinct patterns of relative abundance specific to the gut and covary at very low abundance in the oropharynx.

References

    1. The Human Microbiome Consortium. Structure, Function and Diversity of the Human Microbiome in an Adult Reference Population. Nature. E-pub ahead of print. 2012. doi: 10.1038/nature11234. - DOI
    1. Stecher B, Hardt WD. The role of microbiota in infectious disease. Trends Microbiol. 2008;16:114. - PubMed
    1. Round JL, Mazmanian SK. The gut microbiota shapes intestinal immune responses during health and disease. Nat Rev Immunol. 2009;9:323. - PMC - PubMed
    1. Garrett WS, Gordon JI, Glimcher LH, et al. Homeostasis and inflammation in the intestine. Cell. 2010;140:870. - PMC - PubMed
    1. Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A. A core gut microbiome in obese and lean twins. Nature. 2009;457:484. - PMC - PubMed

Publication types

Substances