Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Sep 7;11(9):4499-508.
doi: 10.1021/pr300234m. Epub 2012 Aug 15.

Learning score function parameters for improved spectrum identification in tandem mass spectrometry experiments

Affiliations

Learning score function parameters for improved spectrum identification in tandem mass spectrometry experiments

Marina Spivak et al. J Proteome Res. .

Abstract

The identification of proteins from spectra derived from a tandem mass spectrometry experiment involves several challenges: matching each observed spectrum to a peptide sequence, ranking the resulting collection of peptide-spectrum matches, assigning statistical confidence estimates to the matches, and identifying the proteins. The present work addresses algorithms to rank peptide-spectrum matches. Many of these algorithms, such as PeptideProphet, IDPicker, or Q-ranker, follow a similar methodology that includes representing peptide-spectrum matches as feature vectors and using optimization techniques to rank them. We propose a richer and more flexible feature set representation that is based on the parametrization of the SEQUEST XCorr score and that can be used by all of these algorithms. This extended feature set allows a more effective ranking of the peptide-spectrum matches based on the target-decoy strategy, in comparison to a baseline feature set devoid of these XCorr-based features. Ranking using the extended feature set gives 10-40% improvement in the number of distinct peptide identifications relative to a range of q-value thresholds. While this work is inspired by the model of the theoretical spectrum and the similarity measure between spectra used specifically by SEQUEST, the method itself can be applied to the output of any database search. Further, our approach can be trivially extended beyond XCorr to any linear operator that can serve as similarity score between experimental spectra and peptide sequences.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Composition of full theoretical spectrum from weighted sum of sub-spectra
The figure shows, for the 1+ charged peptide AGGEFPQRK, theoretical sub-spectra for b- and y-ions, with and without neutral losses of NH3. The right-most panel is a sum of these sub-spectra, with b- and y-ions assigned a height of 1 (corresponding to w1,2 = 1) and neutral losses assigned a height of 0.1 (corresponding to w3,4 = 0.1).
Figure 2
Figure 2. Comparison of base and extended feature sets
The figure shows the number of unique target peptides identified as a function of q-value threshold for the ranking algorithm using base and extended feature sets.
Figure 3
Figure 3. Comparison of base and extended feature sets on six replicate C. elegans data sets
Panel A shows the number of unique target peptides identified in two or more replicate data sets as a function of q-value threshold for the ranking algorithm using base and extended feature sets. Panel B shows the average of absolute values of retention time differences (in minutes) of peptides identified in two or more replicate data sets as a function of number of peptides at the top of the rank list.
Figure 4
Figure 4. Percent of peptide-spectrum matches that were considered “high quality” by the Bullseye algorithm
The figure shows the percent of “Bullseye hits” among the the peptide-spectrum matches identified using the extended feature set or base feature set as a function of number of peptide-spectrum matches at the top of the ranked list in the six replicate runs.
Figure 5
Figure 5. Comparison of PeptideProphet, Percolator and Q-ranker with base and extended feature sets
Panels A–C show the number of unique target peptides identified as a function of q-value threshold. Panels D–F show the number of known target peptides identified as a function of q-value threshold.
Figure 6
Figure 6. Comparison of RE-CID and fHCD data
The figure shows analysis of two C. elegans data sets that were generated using either RE-CID and fHCD collision-induced dissociation. The blue and red lines correspond to the database search conducted using theoretical spectrum model with all peaks included, which we call the “original” search. This search was subsequently analyzed using either base or extended feature sets. The cyan line corresponds to the database search that used theoretical spectra without flanking peaks, and subsequent analysis using base feature set. The magenta line corresponds to the database search that used theoretical spectra without flanking peaks or b-ions, and subsequent analysis using base feature set.

References

    1. Nesvizhskii AI, Vitek O, Aebersold R. Analysis and validation of proteomic data generated by tandem mass spectrometry. Nature Methods. 2007;4(10):787–797. - PubMed
    1. Eng JK, McCormack AL, Yates JR., III An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry. 1994;5:976–989. - PubMed
    1. Keller A, Nesvizhskii AI, Kolker E, Aebersold R. Empirical statistical model to estimate the accuracy of peptide identification made by MS/MS and database search. Analytical Chemistry. 2002;74:5383–5392. - PubMed
    1. Choi H, Nesvizhskii AI. Semisupervised model-based validation of peptide identifications in mass spectrometry-based proteomics. Journal of Proteome Research. 2008;7(1):254–265. - PubMed
    1. Ding Y, Choi H, Nesvizhskii A. Adaptive discriminant function analysis and reranking of MS/MS database search results for improved peptide identification in shotgun proteomics. Journal of Proteome Research. 2008;7(11):4878–4889. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources