Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jun;7(6):2195-203.
doi: 10.1021/pr070510t. Epub 2008 Apr 19.

Linear discriminant analysis-based estimation of the false discovery rate for phosphopeptide identifications

Affiliations

Linear discriminant analysis-based estimation of the false discovery rate for phosphopeptide identifications

Xiuxia Du et al. J Proteome Res. 2008 Jun.

Abstract

The development of liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) has made it possible to characterize phosphopeptides in an increasingly large-scale and high-throughput fashion. However, extracting confident phosphopeptide identifications from the resulting large data sets in a similar high-throughput fashion remains difficult, as does rigorously estimating the false discovery rate (FDR) of a set of phosphopeptide identifications. This article describes a data analysis pipeline designed to address these issues. The first step is to reanalyze phosphopeptide identifications that contain ambiguous assignments for the incorporated phosphate(s) to determine the most likely arrangement of the phosphate(s). The next step is to employ an expectation maximization algorithm to estimate the joint distribution of the peptide scores. A linear discriminant analysis is then performed to determine how to optimally combine peptide scores (in this case, from SEQUEST) into a discriminant score that possesses the maximum discriminating power. Based on this discriminant score, the p- and q-values for each phosphopeptide identification are calculated, and the phosphopeptide identification FDR is then estimated. This data analysis approach was applied to data from a study of irradiated human skin fibroblasts to provide a robust estimate of FDR for phosphopeptides. The Phosphopeptide FDR Estimator software is freely available for download at http://ncrr.pnl.gov/software/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flow chart of the data analysis pipeline used to estimate the FDR of phosphopeptide identifications.
Figure 2
Figure 2
Illustration of the q-value calculation. The red and blue curves denote the pdfs of F that correspond to the incorrect and correct phosphopeptide identifications, respectively. For a given discriminant score f, FDR(f) is the ratio of the red area divided by the blue area. The q-value is then calculated by Eqn. (17).
Figure 3
Figure 3
Histogram of the XCorr rank of the true top hit (blue) and the true second hit (red).
Figure 4
Figure 4
Scatter plots of the SEQUEST search results. The lower-left and upper-right clusters corresponded to the incorrect and correct phosphopeptide identifications, respectively. (A). ΔCn¯ vs. XCorr¯. (B). ΔCn¯ vs. Sp¯. (C). Sp¯ vs. XCorr¯.
Figure 5
Figure 5
Histogram of ΔM. The maximum lower and minimum upper thresholds of the central dense region were 0 and 10 ppm, respectively.
Figure 6
Figure 6
Estimation of the joint pdf of XCorr¯, Sp¯, and ΔCn¯ using EM. The x-axis denotes the indices of iterations and the y-axis denotes the values of the estimated parameters. The parameters that correspond to the incorrect and correct identifications are red and blue, respectively. (A). Convergence of π0 andπ1. (B). Convergence of μ0_XCorr¯ and μ1_XCorr¯. (C). Convergence ofμ0_Sp¯ and μ1_Sp¯. (D). Convergence of μ0_ΔCn¯ and μ1_ΔCn¯. (E) Convergence of cov(XCorr¯,ΔCn¯). (F). Convergence of cov(Sp¯,ΔCn¯). (G). Convergence of cov(XCorr¯,Sp¯). The insets in (B), (C), and (D) are the estimated means of XCorr¯, Sp¯, and ΔCn¯ on a zoomed-in scale.
Figure 7
Figure 7
Determination of the cluster membership using the joint pdf of XCorr¯, Sp¯, and ΔCn¯. The x- and y-axes denote the probability that each phosphopeptide identification belongs to the distribution of incorrect (p-) and correct (p+) identifications, respectively. The identifications in red (below the 45° line) were assigned to cluster 0, and those in blue (above the 45° line) were assigned to cluster 1.
Figure 8
Figure 8
A,B: Histogram of F and the estimated pdf of F. The red and blue curves correspond to the incorrect and correct identifications, respectively. C,D: The p-value (red), q-value (green), and p(+|F) (blue) for each identification. The FNR is shown in cyan. E,F: ROC curves.

References

    1. Hunter T. Signaling--2000 and beyond. Cell. 2000;100(1):113–27. - PubMed
    1. Reed SI. G1/S regulatory mechanisms from yeast to man. Prog Cell Cycle Res. 1996;2:15–27. - PubMed
    1. Ciechanover A, Orian A, Schwartz AL. Ubiquitin-mediated proteolysis: biological regulation via destruction. Bioessays. 2000;22(5):442–51. - PubMed
    1. Zimmer JS, Monroe ME, Qian WJ, Smith RD. Advances in proteomics data analysis and display using an accurate mass and time tag approach. Mass Spectrom Rev. 2006;25(3):450–82. - PMC - PubMed
    1. Kall L, Storey JD, Maccoss MJ, Noble WS. Assigning Significance to Peptides Identified by Tandem Mass Spectrometry Using Decoy Databases. J Proteome Res. 2007 - PubMed

Publication types

Substances