Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011;6(11):e27173.
doi: 10.1371/journal.pone.0027173. Epub 2011 Nov 23.

Strategies for metagenomic-guided whole-community proteomics of complex microbial environments

Affiliations

Strategies for metagenomic-guided whole-community proteomics of complex microbial environments

Brandi L Cantarel et al. PLoS One. 2011.

Abstract

Accurate protein identification in large-scale proteomics experiments relies upon a detailed, accurate protein catalogue, which is derived from predictions of open reading frames based on genome sequence data. Integration of mass spectrometry-based proteomics data with computational proteome predictions from environmental metagenomic sequences has been challenging because of the variable overlap between proteomic datasets and corresponding short-read nucleotide sequence data. In this study, we have benchmarked several strategies for increasing microbial peptide spectral matching in metaproteomic datasets using protein predictions generated from matched metagenomic sequences from the same human fecal samples. Additionally, we investigated the impact of mass spectrometry-based filters (high mass accuracy, delta correlation), and de novo peptide sequencing on the number and robustness of peptide-spectrum assignments in these complex datasets. In summary, we find that high mass accuracy peptide measurements searched against non-assembled reads from DNA sequencing of the same samples significantly increased identifiable proteins without sacrificing accuracy.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Creation of protein sequence databases.
Protein sequence databases were created from metagenomic sequence reads using a variety of methods for assembly and gene finding.
Figure 2
Figure 2. Comparison of identified peptides using sequence similarity techniques.
Percentage of matches found when comparing identified peptides from sample 6a (left panel) or 6b (right panel) to predicted proteins using FASTS (gray bars) and raw sequencing reads using TFASTS (white striped bars).
Figure 3
Figure 3. Performance and comparison of de novo peptide sequencing results.
Distribution of assigned spectra per de novo algorithm with a predicted consensus sequence (partial and/or exact sequence match) among all three algorithms, PEAKS, PepNovo+, and SEQUEST. Identified peptides from SEQUEST and RMPS sequence database were compared to the de novo predicted peptides for (A) 6a Run 2, (B) 6a Run 3, (C) 6b Run 1, and (D) 6b Run 2.

References

    1. Craig R, Beavis RC. TANDEM: matching proteins with tandem mass spectra. Bioinformatics. 2004;20:1466–1467. - PubMed
    1. Eng JK, Mccormack AL, Yates JR. An Approach to Correlate Tandem Mass-Spectral Data of Peptides with Amino-Acid-Sequences in a Protein Database. Journal of the American Society for Mass Spectrometry. 1994;5:976–989. - PubMed
    1. Perkins DN, Pappin DJC, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20:3551–3567. - PubMed
    1. Ronaghi M, Uhlen M, Nyren P. A sequencing method based on real-time pyrophosphate. Science. 1998;281 - PubMed
    1. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456:53–59. - PMC - PubMed

Publication types