Comparative Study

. 2013 Dec 9;8(12):e82981.

doi: 10.1371/journal.pone.0082981. eCollection 2013.

Evaluating the impact of different sequence databases on metaproteome analysis: insights from a lab-assembled microbial mixture

Alessandro Tanca¹, Antonio Palomba², Massimo Deligios¹, Tiziana Cubeddu³, Cristina Fraumene³, Grazia Biosa³, Daniela Pagnozzi³, Maria Filippa Addis¹, Sergio Uzzau¹

Affiliations

¹ Porto Conte Ricerche Srl, Tramariglio, Alghero, Italy ; Dipartimento di Scienze Biomediche, Università di Sassari, Sassari, Italy.
² Dipartimento di Scienze Biomediche, Università di Sassari, Sassari, Italy.
³ Porto Conte Ricerche Srl, Tramariglio, Alghero, Italy.

PMID: 24349410
PMCID: PMC3857319
DOI: 10.1371/journal.pone.0082981

Comparative Study

Evaluating the impact of different sequence databases on metaproteome analysis: insights from a lab-assembled microbial mixture

Alessandro Tanca et al. PLoS One. 2013.

. 2013 Dec 9;8(12):e82981.

doi: 10.1371/journal.pone.0082981. eCollection 2013.

Authors

Alessandro Tanca¹, Antonio Palomba², Massimo Deligios¹, Tiziana Cubeddu³, Cristina Fraumene³, Grazia Biosa³, Daniela Pagnozzi³, Maria Filippa Addis¹, Sergio Uzzau¹

Affiliations

¹ Porto Conte Ricerche Srl, Tramariglio, Alghero, Italy ; Dipartimento di Scienze Biomediche, Università di Sassari, Sassari, Italy.
² Dipartimento di Scienze Biomediche, Università di Sassari, Sassari, Italy.
³ Porto Conte Ricerche Srl, Tramariglio, Alghero, Italy.

PMID: 24349410
PMCID: PMC3857319
DOI: 10.1371/journal.pone.0082981

Abstract

Metaproteomics enables the investigation of the protein repertoire expressed by complex microbial communities. However, to unleash its full potential, refinements in bioinformatic approaches for data analysis are still needed. In this context, sequence databases selection represents a major challenge. This work assessed the impact of different databases in metaproteomic investigations by using a mock microbial mixture including nine diverse bacterial and eukaryotic species, which was subjected to shotgun metaproteomic analysis. Then, both the microbial mixture and the single microorganisms were subjected to next generation sequencing to obtain experimental metagenomic- and genomic-derived databases, which were used along with public databases (namely, NCBI, UniProtKB/SwissProt and UniProtKB/TrEMBL, parsed at different taxonomic levels) to analyze the metaproteomic dataset. First, a quantitative comparison in terms of number and overlap of peptide identifications was carried out among all databases. As a result, only 35% of peptides were common to all database classes; moreover, genus/species-specific databases provided up to 17% more identifications compared to databases with generic taxonomy, while the metagenomic database enabled a slight increment in respect to public databases. Then, database behavior in terms of false discovery rate and peptide degeneracy was critically evaluated. Public databases with generic taxonomy exhibited a markedly different trend compared to the counterparts. Finally, the reliability of taxonomic attribution according to the lowest common ancestor approach (using MEGAN and Unipept software) was assessed. The level of misassignments varied among the different databases, and specific thresholds based on the number of taxon-specific peptides were established to minimize false positives. This study confirms that database selection has a significant impact in metaproteomics, and provides critical indications for improving depth and reliability of metaproteomic results. Specifically, the use of iterative searches and of suitable filters for taxonomic assignments is proposed with the aim of increasing coverage and trustworthiness of metaproteomic data.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. Schematic illustration of the study workflow.**
A) Experimental design. B) Database classes examined.

**Figure 2. Comparison of metaproteomic data obtained with different databases.**
A) Number of peptide sequences (left) and peptide-spectrum matches (PSMs, right) identified in the 9MM using different sequence databases (FDR<1%). B) Left, Venn diagram illustrating the peptide distribution among four different DB classes. Center, Venn diagram illustrating the peptide distribution among all NCBI-, TrEMBL- and SwissProt-based DBs used in this study. Right, Venn diagram illustrating the peptide distribution among all DBs with generic microbial taxonomy (BFV), genus-specific taxonomy (G), and species-specific taxonomy (S).

**Figure 3. Evaluation of FDR behavior and peptide degeneracy using different databases.**
A) Diagram plotting the number of peptides (left) and PSMs (right) identified with each database as a function of FDR thresholds based on the Percolator q-values. B) Bar graph showing the percentage increment in peptide (left) and PSM (right) identifications achieved with each database when increasing the FDR threshold from 1 to 5%. C) Bar graph illustrating the percentage of shared peptides (left) and PSMs (right) identified with each database at FDR<1%.

**Figure 4. Reliability of taxonomic attribution using Unipept and MEGAN.**
Bar graphs showing taxonomic distribution of family (top), genus (middle) and species (bottom) specific peptides identified with different DBs, according to Unipept (left) or MEGAN (right) LCA analysis. Red rectangles illustrate misassignments (i.e. attributions to taxa not actually present in the 9MM), with indication of their percentage for each DB. Bacterial taxa are represented by shades of blue, whereas yeast taxa by shades of green.

**Figure 5. Improvement of the reliability of taxonomic attribution upon data filtering.**
Histograms showing the number of families (top), genera (middle) and species (bottom) detected upon Unipept (left) or MEGAN (right) LCA analysis using different DBs, before and after the application of a filter based on the number of taxon-specific peptides (u, unfiltered; f, filtered). The threshold was set to 0.5% of the overall number of peptides unambiguously assigned to a taxon at a particular taxonomic rank level (family, genus or species). Correct and incorrect attributions are represented in green and red, respectively. The light blue lines and numbers correspond to the number of families, genera or species actually present in the 9MM.

See this image and copyright information in PMC

References

1. Hood L (2012) Tackling the microbiome. Science 336: 1209. - PubMed
1. Roling WF, Ferrer M, Golyshin PN (2010) Systems approaches to microbial communities and their functioning. Curr Opin Biotechnol 21: 532–538. - PubMed
1. Wessel AK, Hmelo L, Parsek MR, Whiteley M (2013) Going local: technologies for exploring bacterial microenvironments. Nat Rev Microbiol 11: 337–348. - PMC - PubMed
1. Mikeskova H, Novotny C, Svobodova K (2012) Interspecific interactions in mixed microbial cultures in a biodegradation perspective. Appl Microbiol Biotechnol 95: 861–870. - PubMed
1. Larsen P, Hamada Y, Gilbert J (2012) Modeling microbial communities: current, developing, and future technologies for predicting microbial community interaction. J Biotechnol 160: 17–24. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- BacDive
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Evaluating the impact of different sequence databases on metaproteome analysis: insights from a lab-assembled microbial mixture

Affiliations

Evaluating the impact of different sequence databases on metaproteome analysis: insights from a lab-assembled microbial mixture

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Research Materials

Miscellaneous