Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Oct 24;7(4):463-467.e6.
doi: 10.1016/j.cels.2018.08.009. Epub 2018 Sep 26.

ProteoStorm: An Ultrafast Metaproteomics Database Search Framework

Affiliations

ProteoStorm: An Ultrafast Metaproteomics Database Search Framework

Doruk Beyter et al. Cell Syst. .

Abstract

Shotgun metaproteomics has the potential to reveal the functional landscape of microbial communities but lacks appropriate methods for complex samples with unknown compositions. In the absence of prior taxonomic information, tandem mass spectra would be searched against large pan-microbial databases, which requires heavy computational workload and reduces sensitivity. We present ProteoStorm, an efficient database search framework for large-scale metaproteomics studies, which identifies high-confidence peptide-spectrum matches (PSMs) while achieving a two-to-three orders-of-magnitude speedup over popular tools. A reanalysis of a urinary tract infection (UTI) dataset of 110 individuals revealed a complex pattern of polymicrobial expression, including sub-types of UTIs, cases of bacterial vaginosis, and evidence of no underlying disease. Importantly, compared to the initial UTI study that restricted the search database to a manually curated list of 20 genera, ProteoStorm identified additional genera that were previously unreported, including a case of infection with the rare pathogen Propionimicrobium.

Keywords: LC-MS/MS; metaproteomics; microbial communities; microbiology; proteome informatics; proteomics.

PubMed Disclaimer

Conflict of interest statement

Declaration of Interests

V.B. is a cofounder, has an equity interest in, and receives income from Digital Proteomics, LLC. The terms of this arrangement have been reviewed and approved by the University of California, San Diego, in accordance with its conflict-of-interest policies. Digital Proteomics was not involved in the research presented here.

Figures

Figure 1.
Figure 1.. ProteoStorm search framework: performance and scalability
(a) Each stage of ProteoStorm is composed of three modules: i) data partitioning, ii) peptide filtering, and iii) p-value computation. Identifications from a fully-tryptic search are used to construct a refined protein database for a semi-tryptic search. PSMs with p-values are reported. (b) In the first module, database and spectra partitioning dramatically reduces search space from all peptides (black dotted lines) to peptides within spectra parent mass ranges (red vertical lines). (c) In the second module, spectrum-peptide pairs are filtered based on shared counts of prominent spectral peaks (red) to b-/y-ions of peptides (numbers within blocks) using an ion-mass indexing data structure. Each colored block represents a unique peptide within the parent mass tolerance of a given spectrum. (d) 946,845 spectra were searched against the UniProtKB database using ProteoStorm, MSGF+, Comet, and MSFragger. ProteoStorm required 9.7 CPU-hours, while other tools required CPU-weeks to complete. (e) Breakdown of ProteoStorm runtime by module. S1 and S2 represent the two different stages of ProteoStorm. See also Figure S3.
Figure 2.
Figure 2.. ProteoStorm identifies bi-clusters of individuals with similar microbial compositions
Searching the full UTI dataset against the RefUP++ database (2,259 genera) using a genera-restriction approach, ProteoStorm identified 64 genera. Out of 73,092 peptides, 28.5% (20,833) mapped uniquely to a single genus. Four bi-clusters (white boxes) were inferred from clusters with an approximately unbiased (au) p-value greater than 0.90 (magenta boxes), indicating a complex pattern of polymicrobial expression, including sub-types of urinary tract infections, cases of bacterial vaginosis, and evidence of no underlying disease. Pathology groups: Healthy, ERY (erythrocyte/vascular injury), EXF (exfoliation of squamous epithelial and urothelial cells), and UTI (urinary tract infection). See also Table S1 and Table S2.

References

    1. Bern M, and Kil YJ (2011). Comment on “Unbiased Statistical Analysis for Multi-Stage Proteomic Search Strategies”. J Proteome Res 10, 2123–2127. - PMC - PubMed
    1. Burke MC, Mirokhin YA, Tchekhovskoi DV, Markey SP, Heidbrink Thompson J, Larkin C, and Stein SE (2017). The Hybrid Search: A Mass Spectral Library Search Method for Discovery of Modifications in Proteomics. J Proteome Res 16, 1924–1935. - PubMed
    1. Chambers MC, Maclean B, Burke R, Amodei D, Ruderman DL, Neumann S, Gatto L, Fischer B, Pratt B, Egertson J, et al. (2012). A cross-platform toolkit for mass spectrometry and proteomics. Nat Biotechnol 30, 918–920. - PMC - PubMed
    1. Chatterjee S, Stupp GS, Park SK, Ducom JC, Yates JR 3rd, Su AI, and Wolan DW (2016). A comprehensive and scalable database search system for metaproteomics. BMC Genomics 17, 642. - PMC - PubMed
    1. Chen C, Li Z, Huang H, Suzek BE, Wu CH, and UniProt C (2013). A fast Peptide Match service for UniProt Knowledgebase. Bioinformatics 29, 2808–2809. - PMC - PubMed

Publication types