Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep 27;4(1):51.
doi: 10.1186/s40168-016-0196-8.

The impact of sequence database choice on metaproteomic results in gut microbiota studies

Affiliations

The impact of sequence database choice on metaproteomic results in gut microbiota studies

Alessandro Tanca et al. Microbiome. .

Abstract

Background: Elucidating the role of gut microbiota in physiological and pathological processes has recently emerged as a key research aim in life sciences. In this respect, metaproteomics, the study of the whole protein complement of a microbial community, can provide a unique contribution by revealing which functions are actually being expressed by specific microbial taxa. However, its wide application to gut microbiota research has been hindered by challenges in data analysis, especially related to the choice of the proper sequence databases for protein identification.

Results: Here, we present a systematic investigation of variables concerning database construction and annotation and evaluate their impact on human and mouse gut metaproteomic results. We found that both publicly available and experimental metagenomic databases lead to the identification of unique peptide assortments, suggesting parallel database searches as a mean to gain more complete information. In particular, the contribution of experimental metagenomic databases was revealed to be mandatory when dealing with mouse samples. Moreover, the use of a "merged" database, containing all metagenomic sequences from the population under study, was found to be generally preferable over the use of sample-matched databases. We also observed that taxonomic and functional results are strongly database-dependent, in particular when analyzing the mouse gut microbiota. As a striking example, the Firmicutes/Bacteroidetes ratio varied up to tenfold depending on the database used. Finally, assembling reads into longer contigs provided significant advantages in terms of functional annotation yields.

Conclusions: This study contributes to identify host- and database-specific biases which need to be taken into account in a metaproteomic experiment, providing meaningful insights on how to design gut microbiota studies and to perform metaproteomic data analysis. In particular, the use of multiple databases and annotation tools has to be encouraged, even though this requires appropriate bioinformatic resources.

Keywords: Bioinformatics; Gut microbiota; Mass spectrometry; Metagenomics; Metaproteomics.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Design of the human stool sample experiment. Database-related acronyms contained into colored cylinders correspond to those indicated in Fig. 2
Fig. 2
Fig. 2
Peptide identification metrics in human gut metaproteomic data obtained using different databases. a Non-redundant peptides identified by searching the MS spectra obtained from a human stool sample against 19 different types of sequence DBs. Each graph illustrates the results achieved using a different bioinformatic platform, namely (from left to right) MetaProteomeAnalyzer (MPA), MaxQuant (MQ), and Proteome Discoverer (PD). RF and R6 (in blue) represent sequence DBs based on metagenomic reads processed by FragGeneScan or six-frame translation, respectively, while CF and C6 (in red) represent sequence DBs based on metagenomic assembled contigs processed by FragGeneScan or six-frame translation, respectively; 18, 6, and 3 are referred to the sequencing depth (in gigabases). Data from UniProt-based DBs are depicted in green (“pseudo-metagenomes” based on taxa found by 16S rDNA gene analysis), orange (sequences selected based on taxa found by a proteomic iterative approach, PI; see Methods for further details), and turquoise (all bacterial sequences); F, G and S are referred to the level of taxonomic filtering (family, genus, and species, respectively). Each column in the histograms contains a darker (identifications with FDR <1 %) and a lighter part (additional identifications with FDR <5 %). b Total peptide identifications obtained using five representative DBs and related multiple DB searches (ampersand indicates merging results from different DBs). All searches were run in triplicate using the three abovementioned bioinformatic platforms
Fig. 3
Fig. 3
Peptide identification metrics in human and mouse gut metaproteomic data obtained using different databases. a Results obtained with human (N = 3, left) and mouse (N = 3, right) samples, with each dot representing a single sample. R-A and C-A are referred to DBs constructed with metagenomic reads and contigs from all samples, respectively. UP-B indicates a UniProt-based DB containing all bacterial entries. Ampersand indicates merging results from parallel DB searches. Asterisks indicate significantly different peptide identification yields between DBs (*p < 0.05; **p < 0.01; ***p < 0.001; paired t test), with the asterisk color corresponding to the DB to which the comparison is referred (e.g., the green asterisk over blue dots indicates significance of the difference between “R-A” and “R-A & UP-B”). b Comparison of metagenomic DB construction strategies applied to human (N = 3, left) and mouse (N = 3, right) samples, with each dot representing a single sample. “R” and “C” refer to reads (blue) and contigs (red), respectively. Matched DBs (sequences from the gut metagenome of the same subject analyzed by metaproteomics) are indicated with “-M,” unmatched DBs (sequences from a gut metagenome of another subject of the same host species) are indicated with “-UM;” merged DBs (sequences from gut metagenomes of all subjects analyzed for that host species) are indicated with “-A.” *p < 0.05 (paired t test)
Fig. 4
Fig. 4
Taxonomic annotation of human and mouse gut metaproteomic data obtained using different databases. DBs were made up of reads (dark blue, R-A) or contigs (red, C-A) from gut metagenomes of all subjects analyzed for each host species, or all bacterial sequences deposited in UniProt (turquoise, UP-B). The annotation tools used were MEGAN (MEG) and Unipept (Up). a Histograms showing the mean number (N = 3, with error bar indicating standard error of the mean) of non-redundant peptides identified (tot) and annotated at different taxonomic levels (p phylum; c class; o order; f family; g genus; s species) in human (left) and mouse (right) samples. b PCA plots of taxa abundances, with each dot indicating a different human (left) or mouse (right) subject
Fig. 5
Fig. 5
Firmicutes/Bacteroidetes ratio in human and mouse subjects measured using different databases. Results obtained with human (N = 3, left) and mouse (N = 3, right) samples, with each dot representing a single sample. DBs were made up of reads (dark blue, R-A) or contigs (red, C-A) from gut metagenomes of all subjects analyzed for each host species, or all bacterial sequences deposited in UniProt (turquoise, UP-B). The annotation tools used were MEGAN (MEG) and Unipept (Up). *p < 0.05; **p < 0.01; ***p < 0.001 (paired t test)
Fig. 6
Fig. 6
Functional annotation of human and mouse gut metaproteomic data obtained using different databases. DBs were made up of reads (dark blue, R-A) or contigs (red, C-A) from gut metagenomes of all subjects analyzed for each host species, or all bacterial sequences deposited in UniProt (turquoise, UP-B). a Histograms showing the mean number (N = 3, with error bar indicating standard error of the mean) of non-redundant peptides identified (tot) and annotated at different levels (PF protein family; KO KEGG ortholog; EC enzyme code; BP Gene Ontology Biological Process; PW pathway; *p < 0.05; **p < 0.01; ***p < 0.001; paired t test) in human (left) and mouse (right) samples upon a blastp against bacterial Swiss-Prot entries. b PCA plots of function abundances, with each dot indicating a different human (left) or mouse (right) subject

References

    1. Sommer F, Backhed F. The gut microbiota—masters of host development and physiology. Nat Rev Microbiol. 2013;11:227–38. doi: 10.1038/nrmicro2974. - DOI - PubMed
    1. Reigstad CS, Kashyap PC. Beyond phylotyping: understanding the impact of gut microbiota on host biology. Neurogastroenterol Motil. 2013;25:358–72. doi: 10.1111/nmo.12134. - DOI - PMC - PubMed
    1. Lamendella R, VerBerkmoes N, Jansson JK. ‘Omics’ of the mammalian gut—new insights into function. Curr Opin Biotechnol. 2012;23:491–500. doi: 10.1016/j.copbio.2012.01.016. - DOI - PubMed
    1. Hettich RL, Pan C, Chourey K, Giannone RJ. Metaproteomics: harnessing the power of high performance mass spectrometry to identify the suite of proteins that control metabolic activities in microbial communities. Anal Chem. 2013;85:4203–14. doi: 10.1021/ac303053e. - DOI - PMC - PubMed
    1. Kolmeder CA, de Vos WM. Metaproteomics of our microbiome—developing insight in function and activity in man and model systems. J Proteomics. 2014;97:3–16. doi: 10.1016/j.jprot.2013.05.018. - DOI - PubMed