Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 May 28;5 Suppl 2(Suppl 2):S9.
doi: 10.1186/1753-6561-5-S2-S9.

MetaPath: identifying differentially abundant metabolic pathways in metagenomic datasets

Affiliations

MetaPath: identifying differentially abundant metabolic pathways in metagenomic datasets

Bo Liu et al. BMC Proc. .

Abstract

Background: Enabled by rapid advances in sequencing technology, metagenomic studies aim to characterize entire communities of microbes bypassing the need for culturing individual bacterial members. One major goal of metagenomic studies is to identify specific functional adaptations of microbial communities to their habitats. The functional profile and the abundances for a sample can be estimated by mapping metagenomic sequences to the global metabolic network consisting of thousands of molecular reactions. Here we describe a powerful analytical method (MetaPath) that can identify differentially abundant pathways in metagenomic datasets, relying on a combination of metagenomic sequence data and prior metabolic pathway knowledge.

Methods: First, we introduce a scoring function for an arbitrary subnetwork and find the max-weight subnetwork in the global network by a greedy search algorithm. Then we compute two p values (pabund and pstruct) using nonparametric approaches to answer two different statistical questions: (1) is this subnetwork differentically abundant? (2) What is the probability of finding such good subnetworks by chance given the data and network structure? Finally, significant metabolic subnetworks are discovered based on these two p values.

Results: In order to validate our methods, we have designed a simulated metabolic pathways dataset and show that MetaPath outperforms other commonly used approaches. We also demonstrate the power of our methods in analyzing two publicly available metagenomic datasets, and show that the subnetworks identified by MetaPath provide valuable insights into the biological activities of the microbiome.

Conclusions: We have introduced a statistical method for finding significant metabolic subnetworks from metagenomic datasets. Compared with previous methods, results from MetaPath are more robust against noise in the data, and have significantly higher sensitivity and specificity (when tested on simulated datasets). When applied to two publicly available metagenomic datasets, the output of MetaPath is consistent with previous observations and also provides several new insights into the metabolic activity of the gut microbiome. The software is freely available at http://metapath.cbcb.umd.edu.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic diagram of the MetaPath methods. Sequences from each sample are annotated against KEGG genes database and mapped to reactions in metabolic networks, resulting an abundance matrix where the rows are reactions and columns are samples. Then p values are computed for all reactions using Metastats [9], then converted into Z values, and greedy search is performed on the edge-weighted graph to find max-weight subnetworks. Finally, we calculate the pabund and pstruct significance values of the max-weight subnetwork.
Figure 2
Figure 2
Significant subnetworks that are caused by structural biases. On the left side, both of the two pathways have equal weight, indicating equal significance of differential abundance. The high weight of the second pathway, however, mainly come from the middle fat edge that has weight 7. On the right side, in a densely connected network, any random high-weight edges will form a subnetwork with high weight (correlated noise).
Figure 3
Figure 3
Comparison of statistical methods for discovering significant reactions in simulated datasets. Four methods are evaluated: discovering active subnetworks using simulated annealing (Anneal) and greedy search (Greedy) [13], discovering significant individual reactions using Metastats [9], finding differentially abundant KEGGdefined pathways (KEGGPath), and MetaPath. Four datasets are created by varying the number of significant reactions n and their significances.
Figure 4
Figure 4
p values distributions from comparing individual metabolic reactions by Metastats and from comparing metabolic networks by MetaPath. The top histogram is the distribution of the p values of individual metabolic reactions calculated by Metastats. The Bottom histogram is the distribution of the pabund values of the subnetworks calculated by MetaPath.
Figure 5
Figure 5
9 statistically significant subnetworks are found in the comparison of the gut microbiome from the obese and lean subjects. All these subnetworks are enriched in the obese subjects. pabund and pstruct significance values are shown above each subnetwork. p values for each reaction are shown with the KEGG reaction number. Five pathways (a)-(e) belong to the Fatty Acid Metabolism pathway in KEGG. Four pathways (f)-(i) contain the L-Homocysteine molecules.
Figure 6
Figure 6
10 statistically significant subpathways are found in the infant and adult individuals dataset. 6 subpathways are enriched in the infant subjects (Fig. 4a-4f), and 4 subpathways are enriched in the adult subjects (Fig. 4g-4j). pabund and pstruct significance values are shown above each pathway. p values for each reaction are shown with the KEGG reaction number.

Similar articles

Cited by

References

    1. Riesenfeld CS, Schloss PD, Handelsman J. Metagenomics: genomic analysis of microbial communities. Annu Rev Genet. 2004;38:525–552. doi: 10.1146/annurev.genet.38.072902.091216. - DOI - PubMed
    1. Beja O, Aravind L, Koonin EV, Suzuki MT, Hadd A, Nguyen LP, Jovanovich SB, Gates CM, Feldman RA, Spudich JL. et al.Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. Science. 2000;289:1902–1906. doi: 10.1126/science.289.5486.1902. - DOI - PubMed
    1. Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, Ley RE, Sogin ML, Jones WJ, Roe BA, Affourtit JP. et al.A core gut microbiome in obese and lean twins. Nature. 2009;457:480–484. doi: 10.1038/nature07540. - DOI - PMC - PubMed
    1. Tatusov RL, Galperin MY, Natale DA, Koonin EV. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000;28:33–36. doi: 10.1093/nar/28.1.33. - DOI - PMC - PubMed
    1. Meyer F, Paarmann D, D'Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A. et al.The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics. 2008;9:386. doi: 10.1186/1471-2105-9-386. - DOI - PMC - PubMed

LinkOut - more resources