Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Feb 15;79(4):1301-10.
doi: 10.1021/ac061334v.

Automatic validation of phosphopeptide identifications from tandem mass spectra

Affiliations

Automatic validation of phosphopeptide identifications from tandem mass spectra

Bingwen Lu et al. Anal Chem. .

Abstract

We developed and compared two approaches for automated validation of phosphopeptide tandem mass spectra identified using database searching algorithms. Phosphopeptide identifications were obtained through SEQUEST searches of a protein database appended with its decoy (reversed sequences). Statistical evaluation and iterative searches were employed to create a high-quality data set of phosphopeptides. Automation of postsearch validation was approached by two different strategies. By using statistical multiple testing, we calculate a p value for each tentative peptide phosphorylation. In a second method, we use a support vector machine (SVM; a machine learning algorithm) binary classifier to predict whether a tentative peptide phosphorylation is true. We show good agreement (85%) between postsearch validation of phosphopeptide/spectrum matches by multiple testing and that from support vector machines. Automatic methods conform very well with manual expert validation in a blinded test. Additionally, the algorithms were tested on the identification of synthetic phosphopeptides. We show that phosphate neutral losses in tandem mass spectra can be used to assess the correctness of phosphopeptide/spectrum matches. An SVM classifier with a radial basis function provided classification accuracy from 95.7% to 96.8% of the positive data set, depending on search algorithm used. Establishing the efficacy of an identification is a necessary step for further postsearch interrogation of the spectra for complete localization of phosphorylation sites. Our current implementation performs validation of phosphoserine/phosphothreonine-containing peptides having one or two phosphorylation sites from data gathered on an ion trap mass spectrometer. The SVM-based algorithm has been implemented in the software package DeBunker. We illustrate the application of the SVM-based software DeBunker on a large phosphorylation data set.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distribution of extracted features for the training set. The plots show the distributions of features for 944 positive spectrum/peptide identifications and 944 negative spectrum/peptide identifications (randomly selected from the 1064 negative training spectrum/peptide matches). A. Distribution of precursor neutral loss/base peak ratios; B. Distribution of the number of fragment ion neutral losses; C. Distribution of percentage of unassigned peak intensities that could be explained by fragment ion neutral losses; D. Distribution of average y-ions intensity.
Figure 2
Figure 2
ROC (Receiver Operating Characteristic) plots for SVMs using different kernels on the test datasets. The true positive and false positive rates are calculated from the positive test set and negative test set, respectively. The plots show results from the linear kernel SVM, polynomial (d=2) kernel SVM, and RBF kernel SVM.
Figure 3
Figure 3
Empirical random distributions for extracted features. Sample size: 5498. A. Random distribution for precursor neutral loss/base peak ratios; B. Random distribution of the number of fragment ion neutral losses; C. Random distribution of percentage of unassigned peak intensities that could be explained by fragment ion neutral losses.
Figure 4
Figure 4
Distribution of p-values for the testing sets. A. Distribution of p-values calculated from the positive testing set; B. Distribution of p-values from the negative testing set; C. Distribution of p-values from the exclusion set.
Figure 5
Figure 5
Example spectra from synthetic phosphopeptides. A. Example spectrum for synthetic peptide pSFVLNPTNIGMSKSSQGHVTK. B. Example spectra for synthetic peptide SFVLNPTNIGMpSKSSQGHVTK.
Figure 5
Figure 5
Example spectra from synthetic phosphopeptides. A. Example spectrum for synthetic peptide pSFVLNPTNIGMSKSSQGHVTK. B. Example spectra for synthetic peptide SFVLNPTNIGMpSKSSQGHVTK.

References

    1. Johnson SA, Hunter T. Nat Methods. 2005;2:17–25. - PubMed
    1. Steen H, Kuster B, Fernandez M, Pandey A, Mann M. J Biol Chem. 2002;277:1031–1039. - PubMed
    1. Rush J, Moritz A, Lee KA, Guo A, Goss VL, Spek EJ, Zhang H, Zha XM, Polakiewicz RD, Comb MJ. Nat Biotechnol. 2005;23:94–101. - PubMed
    1. Ficarro SB, McCleland ML, Stukenberg PT, Burke DJ, Ross MM, Shabanowitz J, Hunt DF, White FM. Nat Biotechnol. 2002;20:301–305. - PubMed
    1. Brill LM, Salomon AR, Ficarro SB, Mukherji M, Stettler-Gill M, Peters EC. Anal Chem. 2004;76:2763–2772. - PubMed

Publication types

Substances