. 2012 Mar 2;11(3):1686-95.

doi: 10.1021/pr200874e. Epub 2012 Jan 27.

Pepitome: evaluating improved spectral library search for identification complementarity and quality assessment

Surendra Dasari¹, Matthew C Chambers, Misti A Martinez, Kristin L Carpenter, Amy-Joan L Ham, Lorenzo J Vega-Montoto, David L Tabb

Affiliations

PMID: 22217208
PMCID: PMC3292681
DOI: 10.1021/pr200874e

Pepitome: evaluating improved spectral library search for identification complementarity and quality assessment

Surendra Dasari et al. J Proteome Res. 2012.

. 2012 Mar 2;11(3):1686-95.

doi: 10.1021/pr200874e. Epub 2012 Jan 27.

Authors

Surendra Dasari¹, Matthew C Chambers, Misti A Martinez, Kristin L Carpenter, Amy-Joan L Ham, Lorenzo J Vega-Montoto, David L Tabb

Affiliation

¹ Department of Biomedical Informatics, Vanderbilt University Medical Center , Nashville, Tennessee 37232-8575, United States.

PMID: 22217208
PMCID: PMC3292681
DOI: 10.1021/pr200874e

Abstract

Spectral libraries have emerged as a viable alternative to protein sequence databases for peptide identification. These libraries contain previously detected peptide sequences and their corresponding tandem mass spectra (MS/MS). Search engines can then identify peptides by comparing experimental MS/MS scans to those in the library. Many of these algorithms employ the dot product score for measuring the quality of a spectrum-spectrum match (SSM). This scoring system does not offer a clear statistical interpretation and ignores fragment ion m/z discrepancies in the scoring. We developed a new spectral library search engine, Pepitome, which employs statistical systems for scoring SSMs. Pepitome outperformed the leading library search tool, SpectraST, when analyzing data sets acquired on three different mass spectrometry platforms. We characterized the reliability of spectral library searches by confirming shotgun proteomics identifications through RNA-Seq data. Applying spectral library and database searches on the same sample revealed their complementary nature. Pepitome identifications enabled the automation of quality analysis and quality control (QA/QC) for shotgun proteomics data acquisition pipelines.

PubMed Disclaimer

Figures

**Figure 1. Peptide Identification Pipeline**
MyriMatch is a database search engine. Pepitome is a spectral library search engine, which matches experimental MS/MS against library spectra. IDPicker is a parsimonious protein assembler, which filters peptide identifications using a target FDR.

**Figure 2. Probabilistic Scoring Systems Are Robust**
One thousand MS/MS spectra were randomly selected from the DLD1 Cell Lines data set. Pepitome and SpectraST matched the spectra against the NIST ion trap library. Search engines were modified to make compact reports of all library comparisons made for each MS/MS. The top five matches by score were removed from each result set. The remaining matches were considered to be stochastic. This figure illustrates the functional relationship between the stochastic search scores and the peak density (average peak counts) of all the library spectra compared to the experimental spectra. Dot products have a positive bias towards high density library spectra. Probabilistic scores like hypergeometric test and Kendall-Tau statistic are resistant to changes in peak density of library spectra.

**Figure 3. Intensity Correlation Metric Improves Identification Rates of Library Searches**
Pepitome identified the MS/MS present in the samples. IDPicker filtered the results at 5% FDR using either HGT (peak presence or absence) score or an optimal combination of HGT and Kendall-Tau (intensity rank correlation) scores. In all samples, combining orthogonal scoring metrics improved peptide identification rates. Error bars in the figure represent standard error of the mean estimated from the replicate LC-MS/MS analyses.

**Figure 4. Performance on Universal Protein Standard (UPS1) Data Set**
PP stands for Pepitome, ST stands for SpectraST, and MM stands for MyriMatch. IDPicker filtered the identifications at 5% FDR. Pepitome recovered more true positive proteins from the sample than any other search engine.

**Figure 5. Performance on Real World Shotgun Proteomics Data Sets**
Pepitome (PP) and SpectraST (ST) matched the experimental spectra against spectral libraries, whereas MyriMatch (MM) matched the MS/MS against a FASTA database. IDPicker filtered the identifications at 2% FDR. This figure illustrates the sample-wise summary of peptide and spectral identification numbers. Overall, Pepitome identified more peptides and spectra compared to other search engines.

**Figure 6. Gains from Spectral Library Searches are Trustworthy**
MS/MS present in the MMR Cell Lines data set were identified with Pepitome and SpectraST. IDPicker filtered the results at 2% FDR. The peptide identification overlap between the search engines is shown in the figure. Identified peptide sequences were matched against the expressed proteome inferred from the RNA-Seq data. The percentage of peptides with corresponding RNA-Seq evidence is also presented in the figure.

**Figure 7. Spectral Library and Database Search Peptide Identification Overlap**
MS/MS scans present in the MMR Cell Lines data set were identified with both a spectral library search engine (Pepitome) and a database search engine (MyriMatch). IDPicker filtered the results at 2% FDR. The peptide identification overlap between the spectral library and database searches is shown in the figure. Identified peptide sequences were matched against the expressed proteome inferred from the RNA-Seq data. The percentage of peptides with corresponding RNA-Seq evidence is also presented in the figure.

**Figure 8. Quality Assessment (QA) Method for Shotgun Proteomics Data Sets**
We performed 902 LC-MS/MS analyses of BSA standards on a LTQ-XL instrument spanning 18 months. Experts reviewed the raw files and assigned a quality label (low or high) to each file. Pepitome identified peptides from the samples and IDPicker filtered the identifications at 2% FDR **(a)** Peptide and spectral identification rates from quality assessed raw files. **(b)** We developed an artificial neural network (ANN) for recapitulating the expert QA of a raw file from a collection of quality metrics. This figure shows the training and testing receiver operating curves when the ANN is employing different categorical collections of quality metrics as inputs.

See this image and copyright information in PMC

Cited by

MS Ana: Improving Sensitivity in Peptide Identification with Spectral Library Search.
Dorl S, Winkler S, Mechtler K, Dorfer V. Dorl S, et al. J Proteome Res. 2023 Feb 3;22(2):462-470. doi: 10.1021/acs.jproteome.2c00658. Epub 2023 Jan 23. J Proteome Res. 2023. PMID: 36688604 Free PMC article.
Middle-down approach: a choice to sequence and characterize proteins/proteomes by mass spectrometry.
Pandeswari PB, Sabareesh V. Pandeswari PB, et al. RSC Adv. 2019 Jan 2;9(1):313-344. doi: 10.1039/c8ra07200k. eCollection 2018 Dec 19. RSC Adv. 2019. PMID: 35521579 Free PMC article. Review.
Chronic intermittent alcohol disrupts the GluN2B-associated proteome and specifically regulates group I mGlu receptor-dependent long-term depression.
Wills TA, Baucum AJ 2nd, Holleran KM, Chen Y, Pasek JG, Delpire E, Tabb DL, Colbran RJ, Winder DG. Wills TA, et al. Addict Biol. 2017 Mar;22(2):275-290. doi: 10.1111/adb.12319. Epub 2015 Nov 8. Addict Biol. 2017. PMID: 26549202 Free PMC article.
Identification of Proteomic Features To Distinguish Benign Pulmonary Nodules from Lung Adenocarcinoma.
Codreanu SG, Hoeksema MD, Slebos RJC, Zimmerman LJ, Rahman SMJ, Li M, Chen SC, Chen H, Eisenberg R, Liebler DC, Massion PP. Codreanu SG, et al. J Proteome Res. 2017 Sep 1;16(9):3266-3276. doi: 10.1021/acs.jproteome.7b00245. Epub 2017 Aug 8. J Proteome Res. 2017. PMID: 28731711 Free PMC article.
Tracking the sources of blood meals of parasitic arthropods using shotgun proteomics and unidentified tandem mass spectral libraries.
Önder Ö, Shao W, Lam H, Brisson D. Önder Ö, et al. Nat Protoc. 2014 Apr;9(4):842-50. doi: 10.1038/nprot.2014.048. Epub 2014 Mar 13. Nat Protoc. 2014. PMID: 24625782 Free PMC article.

See all "Cited by" articles

References

1. Rodriguez H, Tezak Z, Mesri M, Carr SA, Liebler DC, Fisher SJ, Tempst P, Hiltke T, Kessler LG, Kinsinger CR, Philip R, Ransohoff DF, Skates SJ, Regnier FE, Anderson NL, Mansfield E. Analytical validation of protein-based multiplex assays: a workshop report by the NCI-FDA interagency oncology task force on molecular diagnostics. Clin Chem. 2010;56:237–243. - PubMed
1. Dasari S, Chambers MC, Codreanu SG, Liebler DC, Collins BC, Pennington SR, Gallagher WM, Tabb DL. Sequence tagging reveals unexpected modifications in toxicoproteomics. Chem Res Toxicol. 2011;24:204–216. - PMC - PubMed
1. Lam H, Aebersold R. Using spectral libraries for peptide identification from tandem mass spectrometry (MS/MS) data. Curr Protoc Protein Sci. 2010;Chapter 25(Unit 25.5) - PubMed
1. Deutsch EW. Tandem mass spectrometry spectral libraries and library searching. Methods Mol Biol. 2011;696:225–232. - PubMed
1. Craig R, Cortens JC, Fenyo D, Beavis RC. Using annotated peptide mass spectrum libraries for protein identification. J Proteome Res. 2006;5:1843–1849. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Pepitome: evaluating improved spectral library search for identification complementarity and quality assessment

Affiliation

Pepitome: evaluating improved spectral library search for identification complementarity and quality assessment

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources