Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Mar 2;11(3):1686-95.
doi: 10.1021/pr200874e. Epub 2012 Jan 27.

Pepitome: evaluating improved spectral library search for identification complementarity and quality assessment

Affiliations

Pepitome: evaluating improved spectral library search for identification complementarity and quality assessment

Surendra Dasari et al. J Proteome Res. .

Abstract

Spectral libraries have emerged as a viable alternative to protein sequence databases for peptide identification. These libraries contain previously detected peptide sequences and their corresponding tandem mass spectra (MS/MS). Search engines can then identify peptides by comparing experimental MS/MS scans to those in the library. Many of these algorithms employ the dot product score for measuring the quality of a spectrum-spectrum match (SSM). This scoring system does not offer a clear statistical interpretation and ignores fragment ion m/z discrepancies in the scoring. We developed a new spectral library search engine, Pepitome, which employs statistical systems for scoring SSMs. Pepitome outperformed the leading library search tool, SpectraST, when analyzing data sets acquired on three different mass spectrometry platforms. We characterized the reliability of spectral library searches by confirming shotgun proteomics identifications through RNA-Seq data. Applying spectral library and database searches on the same sample revealed their complementary nature. Pepitome identifications enabled the automation of quality analysis and quality control (QA/QC) for shotgun proteomics data acquisition pipelines.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Peptide Identification Pipeline
MyriMatch is a database search engine. Pepitome is a spectral library search engine, which matches experimental MS/MS against library spectra. IDPicker is a parsimonious protein assembler, which filters peptide identifications using a target FDR.
Figure 2
Figure 2. Probabilistic Scoring Systems Are Robust
One thousand MS/MS spectra were randomly selected from the DLD1 Cell Lines data set. Pepitome and SpectraST matched the spectra against the NIST ion trap library. Search engines were modified to make compact reports of all library comparisons made for each MS/MS. The top five matches by score were removed from each result set. The remaining matches were considered to be stochastic. This figure illustrates the functional relationship between the stochastic search scores and the peak density (average peak counts) of all the library spectra compared to the experimental spectra. Dot products have a positive bias towards high density library spectra. Probabilistic scores like hypergeometric test and Kendall-Tau statistic are resistant to changes in peak density of library spectra.
Figure 3
Figure 3. Intensity Correlation Metric Improves Identification Rates of Library Searches
Pepitome identified the MS/MS present in the samples. IDPicker filtered the results at 5% FDR using either HGT (peak presence or absence) score or an optimal combination of HGT and Kendall-Tau (intensity rank correlation) scores. In all samples, combining orthogonal scoring metrics improved peptide identification rates. Error bars in the figure represent standard error of the mean estimated from the replicate LC-MS/MS analyses.
Figure 4
Figure 4. Performance on Universal Protein Standard (UPS1) Data Set
PP stands for Pepitome, ST stands for SpectraST, and MM stands for MyriMatch. IDPicker filtered the identifications at 5% FDR. Pepitome recovered more true positive proteins from the sample than any other search engine.
Figure 5
Figure 5. Performance on Real World Shotgun Proteomics Data Sets
Pepitome (PP) and SpectraST (ST) matched the experimental spectra against spectral libraries, whereas MyriMatch (MM) matched the MS/MS against a FASTA database. IDPicker filtered the identifications at 2% FDR. This figure illustrates the sample-wise summary of peptide and spectral identification numbers. Overall, Pepitome identified more peptides and spectra compared to other search engines.
Figure 6
Figure 6. Gains from Spectral Library Searches are Trustworthy
MS/MS present in the MMR Cell Lines data set were identified with Pepitome and SpectraST. IDPicker filtered the results at 2% FDR. The peptide identification overlap between the search engines is shown in the figure. Identified peptide sequences were matched against the expressed proteome inferred from the RNA-Seq data. The percentage of peptides with corresponding RNA-Seq evidence is also presented in the figure.
Figure 7
Figure 7. Spectral Library and Database Search Peptide Identification Overlap
MS/MS scans present in the MMR Cell Lines data set were identified with both a spectral library search engine (Pepitome) and a database search engine (MyriMatch). IDPicker filtered the results at 2% FDR. The peptide identification overlap between the spectral library and database searches is shown in the figure. Identified peptide sequences were matched against the expressed proteome inferred from the RNA-Seq data. The percentage of peptides with corresponding RNA-Seq evidence is also presented in the figure.
Figure 8
Figure 8. Quality Assessment (QA) Method for Shotgun Proteomics Data Sets
We performed 902 LC-MS/MS analyses of BSA standards on a LTQ-XL instrument spanning 18 months. Experts reviewed the raw files and assigned a quality label (low or high) to each file. Pepitome identified peptides from the samples and IDPicker filtered the identifications at 2% FDR (a) Peptide and spectral identification rates from quality assessed raw files. (b) We developed an artificial neural network (ANN) for recapitulating the expert QA of a raw file from a collection of quality metrics. This figure shows the training and testing receiver operating curves when the ANN is employing different categorical collections of quality metrics as inputs.
Figure 8
Figure 8. Quality Assessment (QA) Method for Shotgun Proteomics Data Sets
We performed 902 LC-MS/MS analyses of BSA standards on a LTQ-XL instrument spanning 18 months. Experts reviewed the raw files and assigned a quality label (low or high) to each file. Pepitome identified peptides from the samples and IDPicker filtered the identifications at 2% FDR (a) Peptide and spectral identification rates from quality assessed raw files. (b) We developed an artificial neural network (ANN) for recapitulating the expert QA of a raw file from a collection of quality metrics. This figure shows the training and testing receiver operating curves when the ANN is employing different categorical collections of quality metrics as inputs.

Similar articles

Cited by

References

    1. Rodriguez H, Tezak Z, Mesri M, Carr SA, Liebler DC, Fisher SJ, Tempst P, Hiltke T, Kessler LG, Kinsinger CR, Philip R, Ransohoff DF, Skates SJ, Regnier FE, Anderson NL, Mansfield E. Analytical validation of protein-based multiplex assays: a workshop report by the NCI-FDA interagency oncology task force on molecular diagnostics. Clin Chem. 2010;56:237–243. - PubMed
    1. Dasari S, Chambers MC, Codreanu SG, Liebler DC, Collins BC, Pennington SR, Gallagher WM, Tabb DL. Sequence tagging reveals unexpected modifications in toxicoproteomics. Chem Res Toxicol. 2011;24:204–216. - PMC - PubMed
    1. Lam H, Aebersold R. Using spectral libraries for peptide identification from tandem mass spectrometry (MS/MS) data. Curr Protoc Protein Sci. 2010;Chapter 25(Unit 25.5) - PubMed
    1. Deutsch EW. Tandem mass spectrometry spectral libraries and library searching. Methods Mol Biol. 2011;696:225–232. - PubMed
    1. Craig R, Cortens JC, Fenyo D, Beavis RC. Using annotated peptide mass spectrum libraries for protein identification. J Proteome Res. 2006;5:1843–1849. - PubMed

Publication types