Linear discriminant analysis-based estimation of the false discovery rate for phosphopeptide identifications

Xiuxia Du¹, Feng Yang, Nathan P Manes, David L Stenoien, Matthew E Monroe, Joshua N Adkins, David J States, Samuel O Purvine, David G Camp 2nd, Richard D Smith

Affiliations

PMID: 18422353
PMCID: PMC2556358
DOI: 10.1021/pr070510t

Linear discriminant analysis-based estimation of the false discovery rate for phosphopeptide identifications

Xiuxia Du et al. J Proteome Res. 2008 Jun.

. 2008 Jun;7(6):2195-203.

doi: 10.1021/pr070510t. Epub 2008 Apr 19.

Authors

Xiuxia Du¹, Feng Yang, Nathan P Manes, David L Stenoien, Matthew E Monroe, Joshua N Adkins, David J States, Samuel O Purvine, David G Camp 2nd, Richard D Smith

Affiliation

¹ Fundamental and Computational Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington 99352, USA.

PMID: 18422353
PMCID: PMC2556358
DOI: 10.1021/pr070510t

Abstract

The development of liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) has made it possible to characterize phosphopeptides in an increasingly large-scale and high-throughput fashion. However, extracting confident phosphopeptide identifications from the resulting large data sets in a similar high-throughput fashion remains difficult, as does rigorously estimating the false discovery rate (FDR) of a set of phosphopeptide identifications. This article describes a data analysis pipeline designed to address these issues. The first step is to reanalyze phosphopeptide identifications that contain ambiguous assignments for the incorporated phosphate(s) to determine the most likely arrangement of the phosphate(s). The next step is to employ an expectation maximization algorithm to estimate the joint distribution of the peptide scores. A linear discriminant analysis is then performed to determine how to optimally combine peptide scores (in this case, from SEQUEST) into a discriminant score that possesses the maximum discriminating power. Based on this discriminant score, the p- and q-values for each phosphopeptide identification are calculated, and the phosphopeptide identification FDR is then estimated. This data analysis approach was applied to data from a study of irradiated human skin fibroblasts to provide a robust estimate of FDR for phosphopeptides. The Phosphopeptide FDR Estimator software is freely available for download at http://ncrr.pnl.gov/software/.

PubMed Disclaimer

Figures

**Figure 1**
Flow chart of the data analysis pipeline used to estimate the *FDR* of phosphopeptide identifications.

**Figure 2**
Illustration of the q-value calculation. The red and blue curves denote the pdfs of F that correspond to the incorrect and correct phosphopeptide identifications, respectively. For a given discriminant score *f, FDR(f)* is the ratio of the red area divided by the blue area. The q-value is then calculated by Eqn. (17).

**Figure 3**
Histogram of the *XCorr* rank of the true top hit (blue) and the true second hit (red).

**Figure 4**
Scatter plots of the SEQUEST search results. The lower-left and upper-right clusters corresponded to the incorrect and correct phosphopeptide identifications, respectively. (A). $\bar{Δ C n^{'}}$ vs. $\bar{X C o r r}$ . (B). $\bar{Δ C n^{'}}$ vs. $\bar{S p}$ . (C). $\bar{S p}$ vs. $\bar{X C o r r}$ .

**Figure 5**
Histogram of ΔM. The maximum lower and minimum upper thresholds of the central dense region were 0 and 10 ppm, respectively.

**Figure 6**
Estimation of the joint pdf of $\bar{X C o r r}$ , $\bar{S p}$ , and $\bar{Δ C n^{'}}$ using EM. The x-axis denotes the indices of iterations and the y-axis denotes the values of the estimated parameters. The parameters that correspond to the incorrect and correct identifications are red and blue, respectively. (A). Convergence of π₀ andπ₁. (B). Convergence of $μ_{0_\bar{X C o r r}}$ and $μ_{1_\bar{X C o r r}}$ . (C). Convergence of $μ_{0_\bar{S p}}$ and $μ_{1_\bar{S p}}$ . (D). Convergence of $μ_{0_\bar{Δ C n^{'}}}$ and $μ_{1_\bar{Δ C n^{'}}}$ . (E) Convergence of $cov (\bar{X C o r r}, \bar{Δ C n^{'}})$ . (F). Convergence of $cov (\bar{S p}, \bar{Δ C n^{'}})$ . (G). Convergence of $cov (\bar{X C o r r}, \bar{S p})$ . The insets in (B), (C), and (D) are the estimated means of $\bar{X C o r r}$ , $\bar{S p}$ , and $\bar{Δ C n^{'}}$ on a zoomed-in scale.

**Figure 7**
Determination of the cluster membership using the joint pdf of $\bar{X C o r r}$ , $\bar{S p}$ , and $\bar{Δ C n^{'}}$ . The x- and y-axes denote the probability that each phosphopeptide identification belongs to the distribution of incorrect (p-) and correct (p+) identifications, respectively. The identifications in red (below the 45° line) were assigned to cluster 0, and those in blue (above the 45° line) were assigned to cluster 1.

**Figure 8**
A,B: Histogram of F and the estimated pdf of F. The red and blue curves correspond to the incorrect and correct identifications, respectively. C,D: The p-value (red), q-value (green), and p(+|F) (blue) for each identification. The *FNR* is shown in cyan. E,F: ROC curves.

See this image and copyright information in PMC

References

1. Hunter T. Signaling--2000 and beyond. Cell. 2000;100(1):113–27. - PubMed
1. Reed SI. G1/S regulatory mechanisms from yeast to man. Prog Cell Cycle Res. 1996;2:15–27. - PubMed
1. Ciechanover A, Orian A, Schwartz AL. Ubiquitin-mediated proteolysis: biological regulation via destruction. Bioessays. 2000;22(5):442–51. - PubMed
1. Zimmer JS, Monroe ME, Qian WJ, Smith RD. Advances in proteomics data analysis and display using an accurate mass and time tag approach. Mass Spectrom Rev. 2006;25(3):450–82. - PMC - PubMed
1. Kall L, Storey JD, Maccoss MJ, Noble WS. Assigning Significance to Peptides Identified by Tandem Mass Spectrometry Using Decoy Databases. J Proteome Res. 2007 - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Linear discriminant analysis-based estimation of the false discovery rate for phosphopeptide identifications

Affiliation

Linear discriminant analysis-based estimation of the false discovery rate for phosphopeptide identifications

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources