The impact of sequence database choice on metaproteomic results in gut microbiota studies

Alessandro Tanca¹, Antonio Palomba¹, Cristina Fraumene¹, Daniela Pagnozzi¹, Valeria Manghina², Massimo Deligios², Thilo Muth^{3

4}, Erdmann Rapp³, Lennart Martens^{5

6

7}, Maria Filippa Addis¹, Sergio Uzzau^{8

9}

Affiliations

¹ Porto Conte Ricerche, Science and Technology Park of Sardinia, Tramariglio, Alghero, Italy.
² Department of Biomedical Sciences, University of Sassari, Sassari, Italy.
³ Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany.
⁴ Research Group Bioinformatics (NG 4), Robert Koch Institute, Berlin, Germany.
⁵ Department of Biochemistry, Ghent University, Ghent, Belgium.
⁶ Medical Biotechnology Center, VIB, Ghent, Belgium.
⁷ Bioinformatics Institute Ghent, Ghent University, Zwijnaarde, Ghent, Belgium.
⁸ Porto Conte Ricerche, Science and Technology Park of Sardinia, Tramariglio, Alghero, Italy. uzzau@portocontericerche.it.
⁹ Department of Biomedical Sciences, University of Sassari, Sassari, Italy. uzzau@portocontericerche.it.

PMID: 27671352
PMCID: PMC5037606
DOI: 10.1186/s40168-016-0196-8

The impact of sequence database choice on metaproteomic results in gut microbiota studies

Alessandro Tanca et al. Microbiome. 2016.

. 2016 Sep 27;4(1):51.

doi: 10.1186/s40168-016-0196-8.

Authors

Affiliations

¹ Porto Conte Ricerche, Science and Technology Park of Sardinia, Tramariglio, Alghero, Italy.
² Department of Biomedical Sciences, University of Sassari, Sassari, Italy.
³ Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany.
⁴ Research Group Bioinformatics (NG 4), Robert Koch Institute, Berlin, Germany.
⁵ Department of Biochemistry, Ghent University, Ghent, Belgium.
⁶ Medical Biotechnology Center, VIB, Ghent, Belgium.
⁷ Bioinformatics Institute Ghent, Ghent University, Zwijnaarde, Ghent, Belgium.
⁸ Porto Conte Ricerche, Science and Technology Park of Sardinia, Tramariglio, Alghero, Italy. uzzau@portocontericerche.it.
⁹ Department of Biomedical Sciences, University of Sassari, Sassari, Italy. uzzau@portocontericerche.it.

PMID: 27671352
PMCID: PMC5037606
DOI: 10.1186/s40168-016-0196-8

Abstract

Background: Elucidating the role of gut microbiota in physiological and pathological processes has recently emerged as a key research aim in life sciences. In this respect, metaproteomics, the study of the whole protein complement of a microbial community, can provide a unique contribution by revealing which functions are actually being expressed by specific microbial taxa. However, its wide application to gut microbiota research has been hindered by challenges in data analysis, especially related to the choice of the proper sequence databases for protein identification.

Results: Here, we present a systematic investigation of variables concerning database construction and annotation and evaluate their impact on human and mouse gut metaproteomic results. We found that both publicly available and experimental metagenomic databases lead to the identification of unique peptide assortments, suggesting parallel database searches as a mean to gain more complete information. In particular, the contribution of experimental metagenomic databases was revealed to be mandatory when dealing with mouse samples. Moreover, the use of a "merged" database, containing all metagenomic sequences from the population under study, was found to be generally preferable over the use of sample-matched databases. We also observed that taxonomic and functional results are strongly database-dependent, in particular when analyzing the mouse gut microbiota. As a striking example, the Firmicutes/Bacteroidetes ratio varied up to tenfold depending on the database used. Finally, assembling reads into longer contigs provided significant advantages in terms of functional annotation yields.

Conclusions: This study contributes to identify host- and database-specific biases which need to be taken into account in a metaproteomic experiment, providing meaningful insights on how to design gut microbiota studies and to perform metaproteomic data analysis. In particular, the use of multiple databases and annotation tools has to be encouraged, even though this requires appropriate bioinformatic resources.

Keywords: Bioinformatics; Gut microbiota; Mass spectrometry; Metagenomics; Metaproteomics.

PubMed Disclaimer

Figures

**Fig. 1**
Design of the human stool sample experiment. Database-related acronyms contained into colored cylinders correspond to those indicated in Fig. 2

**Fig. 2**
Peptide identification metrics in human gut metaproteomic data obtained using different databases. a Non-redundant peptides identified by searching the MS spectra obtained from a human stool sample against 19 different types of sequence DBs. Each graph illustrates the results achieved using a different bioinformatic platform, namely (from *left to right*) MetaProteomeAnalyzer (MPA), MaxQuant (MQ), and Proteome Discoverer (PD). RF and R6 (in *blue*) represent sequence DBs based on metagenomic reads processed by FragGeneScan or six-frame translation, respectively, while CF and C6 (in *red*) represent sequence DBs based on metagenomic assembled contigs processed by FragGeneScan or six-frame translation, respectively; 18, 6, and 3 are referred to the sequencing depth (in gigabases). Data from UniProt-based DBs are depicted in *green* (“pseudo-metagenomes” based on taxa found by 16S rDNA gene analysis), *orange* (sequences selected based on taxa found by a proteomic iterative approach, PI; see Methods for further details), and *turquoise* (all bacterial sequences); F, G and S are referred to the level of taxonomic filtering (family, genus, and species, respectively). Each column in the histograms contains a darker (identifications with FDR <1 %) and a lighter part (additional identifications with FDR <5 %). b Total peptide identifications obtained using five representative DBs and related multiple DB searches (*ampersand* indicates merging results from different DBs). All searches were run in triplicate using the three abovementioned bioinformatic platforms

**Fig. 3**
Peptide identification metrics in human and mouse gut metaproteomic data obtained using different databases. a Results obtained with human (N = 3, *left*) and mouse (N = 3, *right*) samples, with each *dot* representing a single sample. *R-A* and *C-A* are referred to DBs constructed with metagenomic reads and contigs from all samples, respectively. UP-B indicates a UniProt-based DB containing all bacterial entries. *Ampersand* indicates merging results from parallel DB searches. *Asterisks* indicate significantly different peptide identification yields between DBs (*p < 0.05; **p < 0.01; ***p < 0.001; paired t test), with the asterisk color corresponding to the DB to which the comparison is referred (e.g., the *green asterisk over blue dots* indicates significance of the difference between “R-A” and “R-A & UP-B”). b Comparison of metagenomic DB construction strategies applied to human (N = 3, *left*) and mouse (N = 3, *right*) samples, with each dot representing a single sample. “R” and “C” refer to reads (*blue*) and contigs (*red*), respectively. Matched DBs (sequences from the gut metagenome of the same subject analyzed by metaproteomics) are indicated with “-M,” unmatched DBs (sequences from a gut metagenome of another subject of the same host species) are indicated with “*-UM*;” merged DBs (sequences from gut metagenomes of all subjects analyzed for that host species) are indicated with *“-A.*” *p < 0.05 (paired t test)

**Fig. 4**
Taxonomic annotation of human and mouse gut metaproteomic data obtained using different databases. DBs were made up of reads (*dark blue*, R-A) or contigs (*red*, C-A) from gut metagenomes of all subjects analyzed for each host species, or all bacterial sequences deposited in UniProt (*turquoise*, UP-B). The annotation tools used were MEGAN (MEG) and Unipept (Up). a Histograms showing the mean number (N = 3, with *error bar* indicating standard error of the mean) of non-redundant peptides identified (*tot*) and annotated at different taxonomic levels (p phylum; c class; o order; f family; g genus; s species) in human (*left*) and mouse (*right*) samples. b PCA plots of taxa abundances, with each *dot* indicating a different human (*left*) or mouse (*right*) subject

**Fig. 5**
*Firmicutes*/*Bacteroidetes* ratio in human and mouse subjects measured using different databases. Results obtained with human (N = 3, *left*) and mouse (N = 3, *right*) samples, with each *dot* representing a single sample. DBs were made up of reads (*dark blue*, R-A) or contigs (*red*, C-A) from gut metagenomes of all subjects analyzed for each host species, or all bacterial sequences deposited in UniProt (*turquoise*, UP-B). The annotation tools used were MEGAN (MEG) and Unipept (Up). *p < 0.05; **p < 0.01; ***p < 0.001 (paired t test)

**Fig. 6**
Functional annotation of human and mouse gut metaproteomic data obtained using different databases. DBs were made up of reads (*dark blue*, R-A) or contigs (*red*, C-A) from gut metagenomes of all subjects analyzed for each host species, or all bacterial sequences deposited in UniProt (*turquoise*, UP-B). a Histograms showing the mean number (N = 3, with *error bar* indicating standard error of the mean) of non-redundant peptides identified (*tot*) and annotated at different levels (PF protein family; KO KEGG ortholog; EC enzyme code; BP Gene Ontology Biological Process; PW pathway; *p < 0.05; **p < 0.01; ***p < 0.001; paired t test) in human (*left*) and mouse (*right*) samples upon a blastp against bacterial Swiss-Prot entries. b PCA plots of function abundances, with each *dot* indicating a different human (*left*) or mouse (*right*) subject

See this image and copyright information in PMC

References

1. Sommer F, Backhed F. The gut microbiota—masters of host development and physiology. Nat Rev Microbiol. 2013;11:227–38. doi: 10.1038/nrmicro2974. - DOI - PubMed
1. Reigstad CS, Kashyap PC. Beyond phylotyping: understanding the impact of gut microbiota on host biology. Neurogastroenterol Motil. 2013;25:358–72. doi: 10.1111/nmo.12134. - DOI - PMC - PubMed
1. Lamendella R, VerBerkmoes N, Jansson JK. ‘Omics’ of the mammalian gut—new insights into function. Curr Opin Biotechnol. 2012;23:491–500. doi: 10.1016/j.copbio.2012.01.016. - DOI - PubMed
1. Hettich RL, Pan C, Chourey K, Giannone RJ. Metaproteomics: harnessing the power of high performance mass spectrometry to identify the suite of proteins that control metabolic activities in microbial communities. Anal Chem. 2013;85:4203–14. doi: 10.1021/ac303053e. - DOI - PMC - PubMed
1. Kolmeder CA, de Vos WM. Metaproteomics of our microbiome—developing insight in function and activity in man and model systems. J Proteomics. 2014;97:3–16. doi: 10.1016/j.jprot.2013.05.018. - DOI - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The impact of sequence database choice on metaproteomic results in gut microbiota studies

Affiliations

The impact of sequence database choice on metaproteomic results in gut microbiota studies

Authors

Affiliations

Abstract

Figures

References

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Research Materials