Integrated metagenomics/metaproteomics reveals human host-microbiota signatures of Crohn's disease

Alison R Erickson¹, Brandi L Cantarel, Regina Lamendella, Youssef Darzi, Emmanuel F Mongodin, Chongle Pan, Manesh Shah, Jonas Halfvarson, Curt Tysk, Bernard Henrissat, Jeroen Raes, Nathan C Verberkmoes, Claire M Fraser, Robert L Hettich, Janet K Jansson

Affiliations

PMID: 23209564
PMCID: PMC3509130
DOI: 10.1371/journal.pone.0049138

Integrated metagenomics/metaproteomics reveals human host-microbiota signatures of Crohn's disease

Alison R Erickson et al. PLoS One. 2012.

. 2012;7(11):e49138.

doi: 10.1371/journal.pone.0049138. Epub 2012 Nov 28.

Authors

Affiliation

¹ Chemical Science Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.

PMID: 23209564
PMCID: PMC3509130
DOI: 10.1371/journal.pone.0049138

Abstract

Crohn's disease (CD) is an inflammatory bowel disease of complex etiology, although dysbiosis of the gut microbiota has been implicated in chronic immune-mediated inflammation associated with CD. Here we combined shotgun metagenomic and metaproteomic approaches to identify potential functional signatures of CD in stool samples from six twin pairs that were either healthy, or that had CD in the ileum (ICD) or colon (CCD). Integration of these omics approaches revealed several genes, proteins, and pathways that primarily differentiated ICD from healthy subjects, including depletion of many proteins in ICD. In addition, the ICD phenotype was associated with alterations in bacterial carbohydrate metabolism, bacterial-host interactions, as well as human host-secreted enzymes. This eco-systems biology approach underscores the link between the gut microbiota and functional alterations in the pathophysiology of Crohn's disease and aids in identification of novel diagnostic targets and disease specific biomarkers.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. Clustering of distal gut metaproteomes according to disease.**
Non-metric multidimensional scaling (nMDS) of distal gut metaproteomes from CD twin cohort. The different colored square symbols represent the metaproteomic profiles for each sample (Blue = CCD, Grey = Healthy, Red = ICD). The numbers beside the symbols refer to the specific patient ID from Dicksved et al., 2008 (proteomes were run in technical duplicates). The axes are dimensionless: the coefficients of determination for the correlations between ordination distances and distances in the original n-dimensional space are. 472 and. 831 for Axis 1 and 2, respectively. A matrix of normalized spectral counts per protein (HMRG database search) from each duplicate metaproteome was imported into PCORD v5 software. nMDS was performed using the Bray-Curtis distance measure A three-dimensional solution was found after 119 iterations. The final stress for the nMDS was 6.47458. The white spots with grey shading correspond to individual proteins identified using HMRG database. Arrows indicate strength of correlation of specific bacterial strains to ordinated data. Pearson correlation coefficients for *Faecalibacterium prausnitzii, Anaerofustis stercorihominis, Clostridium leptum, Bacteroides ovatus*, *Bacteroides sp. 4_3*, and *Bacteroides sp. 3_1* were −0.875, −0.851, 0.784, 0.8, 0.788, and 0.817, respectively.

**Figure 2. Taxonomic assignments in metagenome and metaproteome datasets.**
Relative abundance (log scale) of genera in (A) metagenomic datasets, determined by reference genome alignments and (B) metaproteomic datasets, determined by HMRG PSMs. Error bars represent standard error of the mean of the samples from Healthy (3 MG, 4 MP), ICD (5 MG, 6 MP) and CCD (2 MG/MP). Asterisks indicate genera that were statistically lower in relative abundance in ICD compared to Healthy (q-values of 0.0030, 0.0041, 0.0041, 0.0040 for *Faecalibacterium Roseburia, Coprococcus* and *Dialaster*, respectively). *Subdolidogranulum* was not included in the HMRG database, so it is not shown in the metaproteome. Grey bars = Healthy, Blue bars = CCD, Red bars = ICD. standard error of the mean.

**Figure 3. Comparison of protein expression levels across disease categories.**
(A) Boxplots depicting the distribution of the fraction of the metagenomes with PSMs. Boxes indicate 25^th, 50^th and 75^th percentile, with whiskers representing 10^th and 90^th percentile points. (B) Gene family richness as measured by the number of KEGG Orthologous group (KO) matches in the metagenomic dataset. Grey = Healthy, Blue = CCD, Red = ICD.

**Figure 4. Metaproteome differences between mean Healthy and mean ICD COG frequencies.**
To determine statistically significant differences between categories, White's non-parametric t-test was used with bootstrapping and Storey FDR multiple test correction. 95% upper and lower confidence intervals are shown. Red and grey bars indicate COG categories that are higher in ICD or Healthy metaproteomes, respectively; Asterisks indicate COG categories that were significantly different between ICD and healthy (q-value<0.05).

**Figure 5. Specific genes and proteins that differ in relative amounts according to disease state.**
Relative Abundance of mucin-desulfating sulfatase (Mds), RagB and SusC/D, Outer Membrane Protein A (OmpA), TonB, Short-Chain Fatty Acid production (SCFA) and Butyrate production in (A) metagenomes and (B) MM metaproteomes. Error bars in (A) and (B) represent the standard error of the mean of the samples from Healthy (3 MG, 4 MP), ICD (5 MG, 6 MP) and CCD (2 MG/MP). (C) Specific outer membrane proteins and proteins involved in SCFA pathway that differed between disease categories. Protein abundances were calculated as normalized spectral abundance using the HMRG database search. The presence-absence heatmap indicates which of the 51 bacterial strains each protein matched to in the HMRG database search: black = species present, white = species absent. Grey = Healthy, Blue = CCD, Red = ICD.

**Figure 6. Metabolic Pathways that Differentiate Healthy and ICD phenotypes.**
(A) Metabolic pathways differentiating between healthy and ICD according to metabolic module analysis (p<0.05; 5% FDR). All pathways are less abundant in ICD compared to healthy except for *Bacteroides* membrane proteins (upper left box) that are more abundant in ICD. The colors reflect their phylogenetic origin that was determined using the lowest common ancestor of their HMRG mappings. Grey highlighted areas discussed in the main text: (1) butyrate production; (2) membrane proteins. (B) Observed metabolic module abundance shift versus its expected value based on the abundance of the host species. To separate out modules whose fold change is higher/lower than expected by the difference in its species abundance, we used the prediction interval of a fitted linear model (blue lines). The grey symbols are (species-separated) modules that are not significantly different between ICD and H (Wilcoxon rank-sum test; 5% FDR). They could have a high median fold change, but this is not always significant (eg when interpersonal variation is high). The colored symbols are (species-separated) modules that are significant between ICD and H (Wilcoxon rank-sum test; 5% FDR). Colored symbols inside the interval are significantly different but are in line with what would be expected from the species difference. Colored symbols outside the blue lines are higher/lower than expected. Specific *Faecalibacterium* proteins that are down regulated in the butyrate module (green squares) include the following: butyryl-CoA dehydrogenase (EC 1.3.99.2), 3-hydroxyacyl-CoA dehydrogenase (EC 1.1.1.35), enoyl-CoA hydratase/carnithine racemase, and acetyl-CoA acetyltransferases; as well as the module for lysine fermentation to acetate and butyrate (pink square). Specific *Bacteroides* proteins that are down regulated in the DNA-directed RNA polymerase module are the following (red X's): alpha and beta subunits (EC 2.7.7.6).

See this image and copyright information in PMC

References

1. Costello EK, Lauber CL, Hamady M, Fierer N, Gordon JI, et al. (2009) Bacterial community variation in human body habitats across space and time. Science 326: 1694–1697. - PMC - PubMed
1. Erlich Y, Chang K, Gordon A, Ronen R, Navon O, et al. (2009) DNA Sudoku-harnessing high-throughput sequencing for multiplexed specimen analysis. Genome Research 19: 1243–1253. - PMC - PubMed
1. Peterson J, Garges S, Giovanni M, McInnes P, Wang L, et al. (2009) The NIH Human Microbiome Project. Genome Res 19: 2317–2323. - PMC - PubMed
1. Qin JJ, Li RQ, Raes J, Arumugam M, Burgdorf KS, et al. (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464: 59–U70. - PMC - PubMed
1. Huttenhower C, Givers D, Knight R, Abubucker S, Badger JH, et al. (2012) Structure, Function and Diversity of the healthy human microbiome. Nature 486: 207–214. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Integrated metagenomics/metaproteomics reveals human host-microbiota signatures of Crohn's disease

Affiliation

Integrated metagenomics/metaproteomics reveals human host-microbiota signatures of Crohn's disease

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical